[mvapich-discuss] Disabling CPU affinity in PSM

Moody, Adam T. moody20 at llnl.gov
Wed Mar 4 14:23:13 EST 2015


Hi Jonathan,
I've just recently installed this on one of our clusters.  I haven't gotten any error reports yet, but not many have tried it yet.

We use a somewhat custom version of PSM, I think.  The rpm is called "infinipath-psm-3.1c-2chaos.ch5.x86_64".  We pull this version from here:
    https://github.com/01org/psm
I see a 3.1 release, so that could be the one we're using.  I can find out exactly if you'd like to know.

What type of error do you see in your test?
-Adam


________________________________
From: Jonathan Perkins [perkinjo at cse.ohio-state.edu]
Sent: Wednesday, March 04, 2015 11:07 AM
To: Moody, Adam T.; Hari Subramoni
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Disabling CPU affinity in PSM

Hi Adam.  We've taken your patch however we are seeing a problem with the portion for "disable PSM from setting CPU affinity".  Below is a note from Jian when debugging a PSM failure in our testing.

For the failure of threads/pt2pt/ibsend in MPICH test suite with PSM, it appears since the cd34b59 commit (Adam's patch for disabling PSM from setting CPU affinity). However, this commit is not the cause. The issue has been existing potentially in the previous MV2 versions. If we run the threads/pt2pt/ibsend test with IPATH_NO_CPUAFFINITY=1 using old versions of MV2, the failure can also be reproduced.

What I'm wondering is whether you've seen any issues with this before.  I'm assuming that you've used this patch on your builds already.  Can you tell us which version of PSM you're using.

We have QLogicIB-Basic.RHEL6-x86_64.7.0.1.0.43 installed on our system.

On Mon, Mar 2, 2015 at 12:51 PM Adam T. Moody <moody20 at llnl.gov<mailto:moody20 at llnl.gov>> wrote:
Great.  Thanks, Hari.
-Adam

Hari Subramoni wrote:

>Hi Adam,
>
>Thanks for identifying the issue and providing the patch. We have taken it
>into the code base. It should be available with our upcoming RC2 release.
>
>Regards,
>Hari.
>
>On Fri, Feb 27, 2015 at 7:55 PM, Adam T. Moody <moody20 at llnl.gov<mailto:moody20 at llnl.gov>> wrote:
>
>
>
>>Hello MVAPICH team,
>>I've attached a patch with two modifications for the PSM channel:
>>
>>   - disable PSM from setting CPU affinity
>>   - install PSM error handler to print more verbose error messages
>>
>>By default during psm_ep_open(), PSM sets CPU affinity on a process if
>>it's not already set.  However the affinity assigned by PSM causes some
>>problems, especially for singleton MPI jobs, i.e., those run w/o mpirun or
>>srun.  PSM binds each process based on its rank so that it binds all
>>singleton jobs to core 0.  This causes problems when running multiple
>>singleton jobs on the same node, since every job is bound to the same core.
>>
>>Typically, people will rely on the process launcher like mpirun or srun to
>>set CPU affinity for each MPI process.  Otherwise, they are most likely
>>running singleton MPI jobs, in which case, they probably don't want to bind
>>all such jobs to the same core.  If someone does want to bind a singleton
>>job, one can use a command like taskset or numactl, which then gives one
>>full control over which CPU the process is bound to.
>>
>>The attached patch disables PSM affinity by specifying
>>PSM_EP_OPEN_AFFINITY_SKIP as an option during psm_ep_open().
>>
>>This patch also installs a PSM error handler to print more verbose PSM
>>error messages.  Currently, our error messages do not provide enough
>>context so that we often see the same message printed for what may be many
>>different errors.  This patch prints an additional error string with more
>>info provided by PSM.
>>-Adam
>>
>>_______________________________________________
>>mvapich-discuss mailing list
>>mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
>>http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>>
>
>
>

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150304/ab9a13df/attachment-0002.html>


More information about the mvapich-discuss mailing list