[mvapich-discuss] Core binding oversubscription with batch schedulers

Mark Dixon m.c.dixon at leeds.ac.uk
Wed Sep 11 12:00:20 EDT 2013


Hi,

I'm using MVAPICH2 1.9 on a Linux/Intel cluster where compute nodes can be 
shared between jobs (but cores are not oversubscribed) and have a few 
problems with core binding. We use gridengine to assign/launch jobs.

By default, an MPI application built with MVAPICH2 appears to bind 
processes to cores linearly - core 0 gets the 1st rank on that host, core 
1 gets the 2nd and so on. As discussed previously on this list, this is 
not appropriate behaviour when there is more than one job on the same 
node, and there are some environment variables to help out with this.

Unfortunately, those variables only seem to allow for the same mappings on 
all hosts. This still does not allow for the common case where the cores 
that are "yours" vary from one compute node to another.

Batch schedulers can launch processes on each of the compute nodes with 
the right NUMA settings for that host (as seen by the likes of numactl). 
What would be really useful, would be for MVAPICH2 to notice what cores it 
has "inherited", and only assign cores out of that pool.

Is this something that MVAPICH2 can do today and I've just not read the 
documentation properly?

Thanks,

Mark
-- 
-----------------------------------------------------------------
Mark Dixon                       Email    : m.c.dixon at leeds.ac.uk
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------


More information about the mvapich-discuss mailing list