[mvapich-discuss] Diagnosing Slow MVAPICH2 Startup

Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson at nasa.gov
Tue Jan 12 09:21:47 EST 2016


MVAPICH2 Gurus,

Every so often on a cluster here at NASA Goddard I like to see how 
different MPI stacks size up in startup time. The test I use (rightly or 
wrongly) is a slightly old, multi-MPI-version-aware version of ziatest 
(like that seen here 
https://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/ziatest/ 
but mine doesn't have the 100MB bit).

Using it, I run a series of tests (timings averages over 10 runs) on 
28-core Haswell nodes from 8 to 256 nodes. If we look at, for example, 
16 nodes, MVAPICH2 is about 4x slower:

Intel MPI 5.1.2.150: 2009.29 ± 218.23 µs
SGI MPT 2.12:        1337.88 ±  99.62 µs
Open MPI 1.10.0:     1937.89 ± 163.66 µs

MVAPICH2 2.1rc1:     8575.81 ± 862.41 µs
MVAPICH2 2.2b_aa:    7998.25 ± 116.06 µs
MVAPICH2 2.2b_bb:    8175.28 ± 608.01 µs
MVAPICH2 2.2b_cc:    8422.5  ± 928.19 µs

For the MVAPICH2 tests, I use mpirun_rsh as the launcher. The 
"subscripts" are: aa is no environment set, bb is MV2_ENABLE_AFFINITY=0 
(there was a warning so I tried its advice), and cc is 
MV2_ENABLE_AFFINITY=0 plus MV2_USE_SHMEM_COLL=0 based on maybe trying 
this: 
http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2015-October/005733.html.

As you can see, MVAPICH2 seems to be a slow starter, and it stays that 
way at the high end too, though not as bad at 256 nodes[1]:

Intel MPI 5.1.2.150:  2841.19 ±  566.81 µs
SGI MPT 2.12:        10961.4  ± 1070.88 µs
Open MPI 1.10.0:      8959.72 ±  244.46 µs

MVAPICH2 2.1rc1:     16099    ± 1035.97 µs
MVAPICH2 2.2b_aa:    16570.7  ± 2089.64 µs
MVAPICH2 2.2b_bb:    16197.3  ± 1414.67 µs
MVAPICH2 2.2b_cc:    16358    ± 1123.69 µs

Now the cluster I'm running is SLURM 14-based (I think) so I can't yet 
have the admins try out the PMIx patch I see on your download page (as 
it seems to be SLURM 15 versioned). I'd imagine that could possibly 
help, right?

Still, I'm thinking it could be something as basic as a need to build 
differently or perhaps a needed environment variable?

Matt

[1] Note, not sure why Intel MPI is doing so well. I'm thinking my test 
and Intel MPI might be magically missing some interaction the others are 
seeing.

-- 
Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
NASA GSFC,    Global Modeling and Assimilation Office
Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
Phone: 301-614-6712                 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson


More information about the mvapich-discuss mailing list