Why do I need MV2_CPU_MAPPING? (was :Re: [mvapich-discuss] MV2_CPU_MAPPING doesn't work on all the nodes)

Mon Jun 25 13:45:30 EDT 2012

Mvapich2 development team,

This thread got me thinking again regarding one issue I have always had with process affinity and mvapich2.  Why do even need to use MV2_CPU_MAPPING?  For 99.9% of the cases that I 
could ever think of for MPI applications, there is one correct way to layout processes.  If I have the machine file, I would know what to do.  If I have OMP_NUM_THREADS, I would know 
what to do.  If for some reason I need a different mapping on different hosts, I would be able to deduce the correct layout from the machine file (which we do need to do and 
MV2_CPU_MAPPING cannot support this).

The only rule I can think of is distribute process evenly across sockets, specify memory affinity should try and stay local.  The only other choice there is should you use a scatter 
or compact distribution.  That could be specified by a variable.

Of course there are many things that could complicate my "simple" view of the world.  Besides launching multiple jobs on a node, what are they and how often do they really happen? 
For multiple jobs, you would probably through out affinity because there would be no consistent way by a user to ensure that two jobs are not stepping on each other (you would need a 
batch system to do this).

Other MPI stacks do not require the use of something like MV2_CPU_MAPPING (Intel MPI and SGI MPI have methods built in).  Honestly most of my users wouldn't get the use of 
MV2_CPU_MAPPING correct.  This is something the system should be doing for them and not figure it out for themselves.

Thanks,
Craig