[mvapich-discuss] Disabling CPU affinity in PSM
Adam T. Moody
moody20 at llnl.gov
Fri Feb 27 19:55:38 EST 2015
Hello MVAPICH team,
I've attached a patch with two modifications for the PSM channel:
- disable PSM from setting CPU affinity
- install PSM error handler to print more verbose error messages
By default during psm_ep_open(), PSM sets CPU affinity on a process if
it's not already set. However the affinity assigned by PSM causes some
problems, especially for singleton MPI jobs, i.e., those run w/o mpirun
or srun. PSM binds each process based on its rank so that it binds all
singleton jobs to core 0. This causes problems when running multiple
singleton jobs on the same node, since every job is bound to the same core.
Typically, people will rely on the process launcher like mpirun or srun
to set CPU affinity for each MPI process. Otherwise, they are most
likely running singleton MPI jobs, in which case, they probably don't
want to bind all such jobs to the same core. If someone does want to
bind a singleton job, one can use a command like taskset or numactl,
which then gives one full control over which CPU the process is bound to.
The attached patch disables PSM affinity by specifying
PSM_EP_OPEN_AFFINITY_SKIP as an option during psm_ep_open().
This patch also installs a PSM error handler to print more verbose PSM
error messages. Currently, our error messages do not provide enough
context so that we often see the same message printed for what may be
many different errors. This patch prints an additional error string
with more info provided by PSM.
-Adam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: psm_affinity.patch
Type: text/x-patch
Size: 2687 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150227/84d6dd7f/attachment-0001.bin>
More information about the mvapich-discuss
mailing list