[mvapich-discuss] Core binding oversubscription with batch
schedulers
Mark Dixon
m.c.dixon at leeds.ac.uk
Wed Sep 11 12:00:20 EDT 2013
Hi,
I'm using MVAPICH2 1.9 on a Linux/Intel cluster where compute nodes can be
shared between jobs (but cores are not oversubscribed) and have a few
problems with core binding. We use gridengine to assign/launch jobs.
By default, an MPI application built with MVAPICH2 appears to bind
processes to cores linearly - core 0 gets the 1st rank on that host, core
1 gets the 2nd and so on. As discussed previously on this list, this is
not appropriate behaviour when there is more than one job on the same
node, and there are some environment variables to help out with this.
Unfortunately, those variables only seem to allow for the same mappings on
all hosts. This still does not allow for the common case where the cores
that are "yours" vary from one compute node to another.
Batch schedulers can launch processes on each of the compute nodes with
the right NUMA settings for that host (as seen by the likes of numactl).
What would be really useful, would be for MVAPICH2 to notice what cores it
has "inherited", and only assign cores out of that pool.
Is this something that MVAPICH2 can do today and I've just not read the
documentation properly?
Thanks,
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : m.c.dixon at leeds.ac.uk
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
More information about the mvapich-discuss
mailing list