[mvapich-discuss] Core binding oversubscription with batch
schedulers
Mark Dixon
m.c.dixon at leeds.ac.uk
Thu Oct 10 09:40:01 EDT 2013
On Thu, 12 Sep 2013, Jonathan Perkins wrote:
...
> Can you elaborate on this? My understanding is that gridengine has an
> rsh replacement which when used by mpirun_rsh will result in the desired
> behavior that the MPI Library will only see the cpuset allocated by grid
> engine on each node (may use numactl under the hood). Is this correct?
(sorry for the delay - suffered from email overload for the past month)
Correct.
> If so, you can activate this rsh replacement by setting the environment
> variable RSH_CMD to the appropriate command at configure time.
Yes; we used to do some something similar to that when we used mpirun_rsh.
However, we recently switched to using hydra: among other advantages, its
support for gridengine means we no longer need to convert the hostfile to
something mpirun_rsh understands.
...
> We do in fact use only the cores inherited by its environment. It
> sounds like the method that you have tried using in grid engine does not
> actually set up the environment but is giving us hints. I think using
> a method where the shell sets up the affinity ahead of time is desirable
> but we can also consider adding support in mpirun_rsh to extend its
> hostfile format to accommodate this.
Depending on the submission flags given by the user, gridengine has
traditionally supported the following:
* environment - the cores assignment is in an environment variables (no
core binding actually done)
* parallel environment - the cores assignment is in the gridengine
hostfile (no core binding actually done)
* set - the cores assignment is done via libnuma.so's affinity routines,
which can be overridden (e.g. by the numactl command).
"set" is the one I was testing: launching MVAPICH2 via Hydra completely
stomped over the affinities setup by gridengine.
...
> This type of issue may affect a decent number of users so there is no
> harm in discussing this here. Thanks for your note. I hope that we can
> find an acceptable solution for your situation.
...
Thanks :)
Since the Oracle takeover of Sun and subsequent forking of the software,
gridengine turned into a hard target to track. The description above
essentially describes the situation pre-fork and probably describes the
lowest common denominator today.
Some of the forks have since moved on to trying to use cpusets instead of
libnuma's affinity routines to perform mandatory restrictions, but the
situation is still in flux. I'm currently investigating these.
Cheers,
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : m.c.dixon at leeds.ac.uk
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
More information about the mvapich-discuss
mailing list