[mvapich-discuss] Core binding oversubscription with batch schedulers

Winkler, Ursula (ursula.winkler at uni-graz.at) ursula.winkler at uni-graz.at
Thu Sep 12 01:38:35 EDT 2013


Hi Mark,

we have the same problems with Gridengine. We "solved" it with setting the environment variable MV2_ENABLE_AFFINITY to 0.

Ursula

----- Reply to -----
Von: mvapich-discuss-bounces at cse.ohio-state.edu [mailto:mvapich-discuss-bounces at cse.ohio-state.edu] Im Auftrag von Mark Dixon
Gesendet: Mittwoch, 11. September 2013 18:00
An: mvapich-discuss at cse.ohio-state.edu
Betreff: [mvapich-discuss] Core binding oversubscription with batch schedulers

Hi,

I'm using MVAPICH2 1.9 on a Linux/Intel cluster where compute nodes can be shared between jobs (but cores are not oversubscribed) and have a few problems with core binding. We use gridengine to assign/launch jobs.

By default, an MPI application built with MVAPICH2 appears to bind processes to cores linearly - core 0 gets the 1st rank on that host, core
1 gets the 2nd and so on. As discussed previously on this list, this is not appropriate behaviour when there is more than one job on the same node, and there are some environment variables to help out with this.

Unfortunately, those variables only seem to allow for the same mappings on all hosts. This still does not allow for the common case where the cores that are "yours" vary from one compute node to another.

Batch schedulers can launch processes on each of the compute nodes with the right NUMA settings for that host (as seen by the likes of numactl). 
What would be really useful, would be for MVAPICH2 to notice what cores it has "inherited", and only assign cores out of that pool.

Is this something that MVAPICH2 can do today and I've just not read the documentation properly?

Thanks,

Mark
--
-----------------------------------------------------------------
Mark Dixon                       Email    : m.c.dixon at leeds.ac.uk
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list