[mvapich-discuss] Diagnosing Slow MVAPICH2 Startup

Hari Subramoni subramoni.1 at osu.edu
Thu Jan 14 13:09:59 EST 2016


Hello Matt,

Can you please do the following two things and let us know what numbers you
get with MVAPICH2?

1. Re-configure MVAPICH2 after disabling RDMA_CM
    - Give "--disable-rdma-cm" when you configure MVAPICH2
2. Set the environment variable MV2_HOMOGENEOUS_CLUSTER=1

Thx,
Hari.

On Wed, Jan 13, 2016 at 3:20 PM, Hari Subramoni <subramoni.1 at osu.edu> wrote:

> Hello Matt,
>
> Thanks for providing the information. We're taking a look at this issue.
> We will update you with our findings soon.
>
> Thx,
> Hari.
>
> On Wed, Jan 13, 2016 at 10:41 AM, Thompson, Matt (GSFC-610.1)[SCIENCE
> SYSTEMS AND APPLICATIONS INC] <matthew.thompson at nasa.gov> wrote:
>
>> DK,
>>
>> Also, for your edification, our admins said they compiled MVAPICH2 2.2b
>> as:
>>
>> ./configure --with-cma --with-limic2 --disable-wrapper-rpath
>> --with-device=ch3:mrail --with-rdma=gen2 CC=icc CXX=icpc F77=ifort FC=ifort
>> CFLAGS="-fpic -m64" CXXFLAGS="-fpic -m64" FFLAGS="-m64 -fpic" FCFLAGS="-m64
>> -fpic" --enable-f77 --enable-fc --enable-cxx --enable-romio
>> --enable-threads=default --with-hwloc -disable-multi-aliases
>> -enable-xrc=yes -enable-hybrid
>>
>> Matt
>>
>> On 01/12/2016 10:03 AM, Panda, Dhabaleswar wrote:
>>
>>> Hi Matt,
>>>
>>> Thanks for your note. We will take a look at the ZiaTest to see what
>>> could be going-on here. At a first glance,
>>> you should not run  MVAPICH2 with MV2_USE_SHMEM_COLL=0?  There are many
>>> optimized collectives which are disabled by making this parameter=0. Have
>>> you run the test without setting this parameter=0? Just use the
>>> default version with affinity=0.
>>>
>>> Thanks,
>>>
>>> DK
>>>
>>>
>>> ________________________________________
>>> From: mvapich-discuss-bounces at cse.ohio-state.edu on behalf of Thompson,
>>> Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] [
>>> matthew.thompson at nasa.gov]
>>> Sent: Tuesday, January 12, 2016 9:26 AM
>>> To: mvapich-discuss at cse.ohio-state.edu
>>> Subject: [mvapich-discuss] Diagnosing Slow MVAPICH2 Startup
>>>
>>> MVAPICH2 Gurus,
>>>
>>> (NOTE: I am resending this. The last one appeared as garbage for some
>>> reason.)
>>>
>>> Every so often on a cluster here at NASA Goddard I like to see how
>>> different MPI stacks size up in startup time. The test I use (rightly or
>>> wrongly) is a slightly old, multi-MPI-version-aware version of ziatest
>>> (like that seen here
>>>
>>> https://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/ziatest/
>>> but mine doesn't have the 100MB bit).
>>>
>>> Using it, I run a series of tests (timings averages over 10 runs) on
>>> 28-core Haswell nodes from 8 to 256 nodes. If we look at, for example,
>>> 16 nodes, MVAPICH2 is about 4x slower:
>>>
>>> Intel MPI 5.1.2.150: 2009.29 +/- 218.23 µs
>>> SGI MPT 2.12:        1337.88 +/-  99.62 µs
>>> Open MPI 1.10.0:     1937.89 +/- 163.66 µs
>>>
>>> MVAPICH2 2.1rc1:     8575.81 +/- 862.41 us
>>> MVAPICH2 2.2b_aa:    7998.25 +/- 116.06 us
>>> MVAPICH2 2.2b_bb:    8175.28 +/- 608.01 us
>>> MVAPICH2 2.2b_cc:    8422.5  +/- 928.19 us
>>>
>>> For the MVAPICH2 tests, I use mpirun_rsh as the launcher. The
>>> "subscripts" are: aa is no environment set, bb is MV2_ENABLE_AFFINITY=0
>>> (there was a warning so I tried its advice), and cc is
>>> MV2_ENABLE_AFFINITY=0 plus MV2_USE_SHMEM_COLL=0 based on maybe trying
>>> this:
>>>
>>> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2015-October/005733.html
>>> .
>>>
>>> As you can see, MVAPICH2 seems to be a slow starter, and it stays that
>>> way at the high end too, though not as bad at 256 nodes[1]:
>>>
>>> Intel MPI 5.1.2.150:  2841.19 +/-  566.81 us
>>> SGI MPT 2.12:        10961.4  +/- 1070.88 us
>>> Open MPI 1.10.0:      8959.72 +/-  244.46 us
>>>
>>> MVAPICH2 2.1rc1:     16099    +/- 1035.97 us
>>> MVAPICH2 2.2b_aa:    16570.7  +/- 2089.64 us
>>> MVAPICH2 2.2b_bb:    16197.3  +/- 1414.67 us
>>> MVAPICH2 2.2b_cc:    16358    +/- 1123.69 us
>>>
>>> Now the cluster I'm running is SLURM 14-based (I think) so I can't yet
>>> have the admins try out the PMIx patch I see on your download page (as
>>> it seems to be SLURM 15 versioned). I'd imagine that could possibly
>>> help, right?
>>>
>>> Still, I'm thinking it could be something as basic as a need to build
>>> differently or perhaps a needed environment variable?
>>>
>>> Matt
>>>
>>> [1] Note, not sure why Intel MPI is doing so well. I'm thinking my test
>>> and Intel MPI might be magically missing some interaction the others are
>>> seeing.
>>>
>>> --
>>> Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
>>> NASA GSFC,    Global Modeling and Assimilation Office
>>> Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
>>> Phone: 301-614-6712                 Fax: 301-614-6246
>>> http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
>> --
>> Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
>> NASA GSFC,    Global Modeling and Assimilation Office
>> Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
>> Phone: 301-614-6712                 Fax: 301-614-6246
>> http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160114/ab1b5195/attachment.html>


More information about the mvapich-discuss mailing list