[mvapich-discuss] Diagnosing Slow MVAPICH2 Startup
Panda, Dhabaleswar
panda at cse.ohio-state.edu
Tue Jan 12 10:03:01 EST 2016
Hi Matt,
Thanks for your note. We will take a look at the ZiaTest to see what could be going-on here. At a first glance,
you should not run MVAPICH2 with MV2_USE_SHMEM_COLL=0? There are many optimized collectives which are disabled by making this parameter=0. Have you run the test without setting this parameter=0? Just use the
default version with affinity=0.
Thanks,
DK
________________________________________
From: mvapich-discuss-bounces at cse.ohio-state.edu on behalf of Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] [matthew.thompson at nasa.gov]
Sent: Tuesday, January 12, 2016 9:26 AM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] Diagnosing Slow MVAPICH2 Startup
MVAPICH2 Gurus,
(NOTE: I am resending this. The last one appeared as garbage for some
reason.)
Every so often on a cluster here at NASA Goddard I like to see how
different MPI stacks size up in startup time. The test I use (rightly or
wrongly) is a slightly old, multi-MPI-version-aware version of ziatest
(like that seen here
https://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/ziatest/
but mine doesn't have the 100MB bit).
Using it, I run a series of tests (timings averages over 10 runs) on
28-core Haswell nodes from 8 to 256 nodes. If we look at, for example,
16 nodes, MVAPICH2 is about 4x slower:
Intel MPI 5.1.2.150: 2009.29 +/- 218.23 µs
SGI MPT 2.12: 1337.88 +/- 99.62 µs
Open MPI 1.10.0: 1937.89 +/- 163.66 µs
MVAPICH2 2.1rc1: 8575.81 +/- 862.41 us
MVAPICH2 2.2b_aa: 7998.25 +/- 116.06 us
MVAPICH2 2.2b_bb: 8175.28 +/- 608.01 us
MVAPICH2 2.2b_cc: 8422.5 +/- 928.19 us
For the MVAPICH2 tests, I use mpirun_rsh as the launcher. The
"subscripts" are: aa is no environment set, bb is MV2_ENABLE_AFFINITY=0
(there was a warning so I tried its advice), and cc is
MV2_ENABLE_AFFINITY=0 plus MV2_USE_SHMEM_COLL=0 based on maybe trying
this:
http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2015-October/005733.html.
As you can see, MVAPICH2 seems to be a slow starter, and it stays that
way at the high end too, though not as bad at 256 nodes[1]:
Intel MPI 5.1.2.150: 2841.19 +/- 566.81 us
SGI MPT 2.12: 10961.4 +/- 1070.88 us
Open MPI 1.10.0: 8959.72 +/- 244.46 us
MVAPICH2 2.1rc1: 16099 +/- 1035.97 us
MVAPICH2 2.2b_aa: 16570.7 +/- 2089.64 us
MVAPICH2 2.2b_bb: 16197.3 +/- 1414.67 us
MVAPICH2 2.2b_cc: 16358 +/- 1123.69 us
Now the cluster I'm running is SLURM 14-based (I think) so I can't yet
have the admins try out the PMIx patch I see on your download page (as
it seems to be SLURM 15 versioned). I'd imagine that could possibly
help, right?
Still, I'm thinking it could be something as basic as a need to build
differently or perhaps a needed environment variable?
Matt
[1] Note, not sure why Intel MPI is doing so well. I'm thinking my test
and Intel MPI might be magically missing some interaction the others are
seeing.
--
Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list