[mvapich-discuss] Diagnosing Slow MVAPICH2 Startup

Hari Subramoni subramoni.1 at osu.edu
Fri Jan 15 14:38:00 EST 2016


Hello Matt,

This is a little strange. This differs from what we've observed on other
supercomputing clusters like Stampede at TACC. I'm following up with some
further questions in a private e-mail.

Thx,
Hari.

On Fri, Jan 15, 2016 at 2:10 PM, Thompson, Matt (GSFC-610.1)[SCIENCE
SYSTEMS AND APPLICATIONS INC] <matthew.thompson at nasa.gov> wrote:

> Hari,
>
> Our admins installed a new MVAPICH2 with that disabling. I also modified
> my ziatest scripts since I think some of them did not correctly use -export.
>
> I re-ran my tests and MVAPICH2 didn't change much. Here's results at 16
> nodes:
>
> Without --disable-rdma-cm:
>
> MV2 2.2b:                    8897.59
> MV2 2.2b + DISABLE_AFFINITY: 7227.84
> MV2 2.2b + HOMOGENEOUS:      7557
>
> With --disable-rdma-cm:
>
> MV2 2.2b:                    8393.73
> MV2 2.2b + DISABLE_AFFINITY: 7403.94
> MV2 2.2b + HOMOGENEOUS:      7224.22
>
> So. Hmm.
>
>
> On 01/14/2016 01:09 PM, Hari Subramoni wrote:
>
>> Hello Matt,
>>
>> Can you please do the following two things and let us know what numbers
>> you get with MVAPICH2?
>>
>> 1. Re-configure MVAPICH2 after disabling RDMA_CM
>>      - Give "--disable-rdma-cm" when you configure MVAPICH2
>> 2. Set the environment variable MV2_HOMOGENEOUS_CLUSTER=1
>>
>> Thx,
>> Hari.
>>
>> On Wed, Jan 13, 2016 at 3:20 PM, Hari Subramoni <subramoni.1 at osu.edu
>> <mailto:subramoni.1 at osu.edu>> wrote:
>>
>>     Hello Matt,
>>
>>     Thanks for providing the information. We're taking a look at this
>>     issue. We will update you with our findings soon.
>>
>>     Thx,
>>     Hari.
>>
>>     On Wed, Jan 13, 2016 at 10:41 AM, Thompson, Matt
>>     (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
>>     <matthew.thompson at nasa.gov <mailto:matthew.thompson at nasa.gov>> wrote:
>>
>>         DK,
>>
>>         Also, for your edification, our admins said they compiled
>>         MVAPICH2 2.2b as:
>>
>>         ./configure --with-cma --with-limic2 --disable-wrapper-rpath
>>         --with-device=ch3:mrail --with-rdma=gen2 CC=icc CXX=icpc
>>         F77=ifort FC=ifort CFLAGS="-fpic -m64" CXXFLAGS="-fpic -m64"
>>         FFLAGS="-m64 -fpic" FCFLAGS="-m64 -fpic" --enable-f77
>>         --enable-fc --enable-cxx --enable-romio --enable-threads=default
>>         --with-hwloc -disable-multi-aliases -enable-xrc=yes -enable-hybrid
>>
>>         Matt
>>
>>         On 01/12/2016 10:03 AM, Panda, Dhabaleswar wrote:
>>
>>             Hi Matt,
>>
>>             Thanks for your note. We will take a look at the ZiaTest to
>>             see what could be going-on here. At a first glance,
>>             you should not run  MVAPICH2 with MV2_USE_SHMEM_COLL=0?
>>             There are many optimized collectives which are disabled by
>>             making this parameter=0. Have you run the test without
>>             setting this parameter=0? Just use the
>>             default version with affinity=0.
>>
>>             Thanks,
>>
>>             DK
>>
>>
>>             ________________________________________
>>             From: mvapich-discuss-bounces at cse.ohio-state.edu
>>             <mailto:mvapich-discuss-bounces at cse.ohio-state.edu> on
>>             behalf of Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND
>>             APPLICATIONS INC] [matthew.thompson at nasa.gov
>>             <mailto:matthew.thompson at nasa.gov>]
>>             Sent: Tuesday, January 12, 2016 9:26 AM
>>             To: mvapich-discuss at cse.ohio-state.edu
>>             <mailto:mvapich-discuss at cse.ohio-state.edu>
>>             Subject: [mvapich-discuss] Diagnosing Slow MVAPICH2 Startup
>>
>>             MVAPICH2 Gurus,
>>
>>             (NOTE: I am resending this. The last one appeared as garbage
>>             for some
>>             reason.)
>>
>>             Every so often on a cluster here at NASA Goddard I like to
>>             see how
>>             different MPI stacks size up in startup time. The test I use
>>             (rightly or
>>             wrongly) is a slightly old, multi-MPI-version-aware version
>>             of ziatest
>>             (like that seen here
>>
>> https://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/ziatest/
>>             but mine doesn't have the 100MB bit).
>>
>>             Using it, I run a series of tests (timings averages over 10
>>             runs) on
>>             28-core Haswell nodes from 8 to 256 nodes. If we look at,
>>             for example,
>>             16 nodes, MVAPICH2 is about 4x slower:
>>
>>             Intel MPI 5.1.2.150 <http://5.1.2.150>: 2009.29 +/- 218.23 µs
>>             SGI MPT 2.12:        1337.88 +/-  99.62 µs
>>             Open MPI 1.10.0:     1937.89 +/- 163.66 µs
>>
>>             MVAPICH2 2.1rc1:     8575.81 +/- 862.41 us
>>             MVAPICH2 2.2b_aa:    7998.25 +/- 116.06 us
>>             MVAPICH2 2.2b_bb:    8175.28 +/- 608.01 us
>>             MVAPICH2 2.2b_cc:    8422.5  +/- 928.19 us
>>
>>             For the MVAPICH2 tests, I use mpirun_rsh as the launcher. The
>>             "subscripts" are: aa is no environment set, bb is
>>             MV2_ENABLE_AFFINITY=0
>>             (there was a warning so I tried its advice), and cc is
>>             MV2_ENABLE_AFFINITY=0 plus MV2_USE_SHMEM_COLL=0 based on
>>             maybe trying
>>             this:
>>
>> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2015-October/005733.html
>> .
>>
>>             As you can see, MVAPICH2 seems to be a slow starter, and it
>>             stays that
>>             way at the high end too, though not as bad at 256 nodes[1]:
>>
>>             Intel MPI 5.1.2.150 <http://5.1.2.150>:  2841.19 +/-  566.81
>> us
>>
>>             SGI MPT 2.12:        10961.4  +/- 1070.88 us
>>             Open MPI 1.10.0:      8959.72 +/-  244.46 us
>>
>>             MVAPICH2 2.1rc1:     16099    +/- 1035.97 us
>>             MVAPICH2 2.2b_aa:    16570.7  +/- 2089.64 us
>>             MVAPICH2 2.2b_bb:    16197.3  +/- 1414.67 us
>>             MVAPICH2 2.2b_cc:    16358    +/- 1123.69 us
>>
>>             Now the cluster I'm running is SLURM 14-based (I think) so I
>>             can't yet
>>             have the admins try out the PMIx patch I see on your
>>             download page (as
>>             it seems to be SLURM 15 versioned). I'd imagine that could
>>             possibly
>>             help, right?
>>
>>             Still, I'm thinking it could be something as basic as a need
>>             to build
>>             differently or perhaps a needed environment variable?
>>
>>             Matt
>>
>>             [1] Note, not sure why Intel MPI is doing so well. I'm
>>             thinking my test
>>             and Intel MPI might be magically missing some interaction
>>             the others are
>>             seeing.
>>
>>             --
>>             Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
>>             NASA GSFC,    Global Modeling and Assimilation Office
>>             Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
>>             Phone: 301-614-6712 <tel:301-614-6712>                 Fax:
>>             301-614-6246 <tel:301-614-6246>
>>             http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
>>             _______________________________________________
>>             mvapich-discuss mailing list
>>             mvapich-discuss at cse.ohio-state.edu
>>             <mailto:mvapich-discuss at cse.ohio-state.edu>
>>
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>>         --
>>         Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
>>         NASA GSFC,    Global Modeling and Assimilation Office
>>         Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
>>         Phone: 301-614-6712 <tel:301-614-6712>                 Fax:
>>         301-614-6246 <tel:301-614-6246>
>>         http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
>>         _______________________________________________
>>         mvapich-discuss mailing list
>>         mvapich-discuss at cse.ohio-state.edu
>>         <mailto:mvapich-discuss at cse.ohio-state.edu>
>>
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>>
>
> --
> Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
> NASA GSFC,    Global Modeling and Assimilation Office
> Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
> Phone: 301-614-6712                 Fax: 301-614-6246
> http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160115/2100c17a/attachment-0001.html>


More information about the mvapich-discuss mailing list