[mvapich-discuss] Disabling CPU affinity in PSM

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Mar 4 14:07:04 EST 2015


Hi Adam.  We've taken your patch however we are seeing a problem with the
portion for "disable PSM from setting CPU affinity".  Below is a note from
Jian when debugging a PSM failure in our testing.

For the failure of threads/pt2pt/ibsend in MPICH test suite with PSM, it
appears since the cd34b59 commit (Adam's patch for disabling PSM from
setting CPU affinity). However, this commit is not the cause. The issue has
been existing potentially in the previous MV2 versions. If we run the
threads/pt2pt/ibsend test with IPATH_NO_CPUAFFINITY=1 using old versions of
MV2, the failure can also be reproduced.

What I'm wondering is whether you've seen any issues with this before.  I'm
assuming that you've used this patch on your builds already.  Can you tell
us which version of PSM you're using.

We have QLogicIB-Basic.RHEL6-x86_64.7.0.1.0.43 installed on our system.

On Mon, Mar 2, 2015 at 12:51 PM Adam T. Moody <moody20 at llnl.gov> wrote:

> Great.  Thanks, Hari.
> -Adam
>
> Hari Subramoni wrote:
>
> >Hi Adam,
> >
> >Thanks for identifying the issue and providing the patch. We have taken it
> >into the code base. It should be available with our upcoming RC2 release.
> >
> >Regards,
> >Hari.
> >
> >On Fri, Feb 27, 2015 at 7:55 PM, Adam T. Moody <moody20 at llnl.gov> wrote:
> >
> >
> >
> >>Hello MVAPICH team,
> >>I've attached a patch with two modifications for the PSM channel:
> >>
> >>   - disable PSM from setting CPU affinity
> >>   - install PSM error handler to print more verbose error messages
> >>
> >>By default during psm_ep_open(), PSM sets CPU affinity on a process if
> >>it's not already set.  However the affinity assigned by PSM causes some
> >>problems, especially for singleton MPI jobs, i.e., those run w/o mpirun
> or
> >>srun.  PSM binds each process based on its rank so that it binds all
> >>singleton jobs to core 0.  This causes problems when running multiple
> >>singleton jobs on the same node, since every job is bound to the same
> core.
> >>
> >>Typically, people will rely on the process launcher like mpirun or srun
> to
> >>set CPU affinity for each MPI process.  Otherwise, they are most likely
> >>running singleton MPI jobs, in which case, they probably don't want to
> bind
> >>all such jobs to the same core.  If someone does want to bind a singleton
> >>job, one can use a command like taskset or numactl, which then gives one
> >>full control over which CPU the process is bound to.
> >>
> >>The attached patch disables PSM affinity by specifying
> >>PSM_EP_OPEN_AFFINITY_SKIP as an option during psm_ep_open().
> >>
> >>This patch also installs a PSM error handler to print more verbose PSM
> >>error messages.  Currently, our error messages do not provide enough
> >>context so that we often see the same message printed for what may be
> many
> >>different errors.  This patch prints an additional error string with more
> >>info provided by PSM.
> >>-Adam
> >>
> >>_______________________________________________
> >>mvapich-discuss mailing list
> >>mvapich-discuss at cse.ohio-state.edu
> >>http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
> >>
> >>
> >>
> >
> >
> >
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150304/86e6ee58/attachment.html>


More information about the mvapich-discuss mailing list