[mvapich-discuss] Disabling CPU affinity in PSM

Adam T. Moody moody20 at llnl.gov
Fri Feb 27 19:55:38 EST 2015


Hello MVAPICH team,
I've attached a patch with two modifications for the PSM channel:

    - disable PSM from setting CPU affinity
    - install PSM error handler to print more verbose error messages

By default during psm_ep_open(), PSM sets CPU affinity on a process if 
it's not already set.  However the affinity assigned by PSM causes some 
problems, especially for singleton MPI jobs, i.e., those run w/o mpirun 
or srun.  PSM binds each process based on its rank so that it binds all 
singleton jobs to core 0.  This causes problems when running multiple 
singleton jobs on the same node, since every job is bound to the same core.

Typically, people will rely on the process launcher like mpirun or srun 
to set CPU affinity for each MPI process.  Otherwise, they are most 
likely running singleton MPI jobs, in which case, they probably don't 
want to bind all such jobs to the same core.  If someone does want to 
bind a singleton job, one can use a command like taskset or numactl, 
which then gives one full control over which CPU the process is bound to.

The attached patch disables PSM affinity by specifying 
PSM_EP_OPEN_AFFINITY_SKIP as an option during psm_ep_open().

This patch also installs a PSM error handler to print more verbose PSM 
error messages.  Currently, our error messages do not provide enough 
context so that we often see the same message printed for what may be 
many different errors.  This patch prints an additional error string 
with more info provided by PSM.
-Adam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: psm_affinity.patch
Type: text/x-patch
Size: 2687 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150227/84d6dd7f/attachment-0001.bin>


More information about the mvapich-discuss mailing list