[mvapich-discuss] Disabling CPU affinity in PSM

Hari Subramoni subramoni.1 at osu.edu
Sat Feb 28 12:35:41 EST 2015


Hi Adam,

Thanks for identifying the issue and providing the patch. We have taken it
into the code base. It should be available with our upcoming RC2 release.

Regards,
Hari.

On Fri, Feb 27, 2015 at 7:55 PM, Adam T. Moody <moody20 at llnl.gov> wrote:

> Hello MVAPICH team,
> I've attached a patch with two modifications for the PSM channel:
>
>    - disable PSM from setting CPU affinity
>    - install PSM error handler to print more verbose error messages
>
> By default during psm_ep_open(), PSM sets CPU affinity on a process if
> it's not already set.  However the affinity assigned by PSM causes some
> problems, especially for singleton MPI jobs, i.e., those run w/o mpirun or
> srun.  PSM binds each process based on its rank so that it binds all
> singleton jobs to core 0.  This causes problems when running multiple
> singleton jobs on the same node, since every job is bound to the same core.
>
> Typically, people will rely on the process launcher like mpirun or srun to
> set CPU affinity for each MPI process.  Otherwise, they are most likely
> running singleton MPI jobs, in which case, they probably don't want to bind
> all such jobs to the same core.  If someone does want to bind a singleton
> job, one can use a command like taskset or numactl, which then gives one
> full control over which CPU the process is bound to.
>
> The attached patch disables PSM affinity by specifying
> PSM_EP_OPEN_AFFINITY_SKIP as an option during psm_ep_open().
>
> This patch also installs a PSM error handler to print more verbose PSM
> error messages.  Currently, our error messages do not provide enough
> context so that we often see the same message printed for what may be many
> different errors.  This patch prints an additional error string with more
> info provided by PSM.
> -Adam
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150228/4bba3f5e/attachment.html>


More information about the mvapich-discuss mailing list