[mvapich-discuss] Slow MPI_Gatherv

Aaron Knister aaron.knister at gmail.com
Mon Jan 12 21:30:48 EST 2015


Hi Everyone,

I'm having performance issues with MPI_Gatherv's taking a long time. I'm
currently testing using the osu_gatherv benchmark. I've tested with both
mvapich 2.0.1 and 2.1rc1. What's curious is MPI_Gather's return very
quickly. Maybe that makes sense, I'm not sure on the differences there.

I'm using Mellanox OFED version MLNX_OFED_LINUX-2.3-1.0.1 (OFED-2.3-1.0.0)
with ConnectIB hardware which uses the mlx5 driver.

Here are the configure args for the builds

mvapich2-2.1rc1: ./configure --with-device=ch3:mrail --with-rdma=gen2
--enable-fast --with-pm=hydra --enable-hybrid --enable-xrc=yes --enable-fc
--enable-f77 --enable-cxx --with-hwloc --enable-mcast

mvapich2-2.0.1: ./configure --disable-wrapper-rpath --with-device=ch3:mrail
--with-rdma=gen2 CC=icc CXX=icpc F77=ifort FC=ifort --enable-hybrid
-disable-multi-aliases --enable-romio --enable-threads=default
--enable-fast --with-pm=hydra --enable-hybrid --enable-xrc=yes --enable-fc
--enable-f77 --enable-cxx --with-hwloc --enable-mcast

Both were built with the intel c/fortran compilers version 2015.0.090.

Here are some performance numbers for the osu_gatherv runs over 1000 nodes.
I'm running with these args "-m 1 -i 1". My goal is to get to to 30,000
cores:

mvapich2-2.1rc1:

1000 cores, 2.02 us latency, 0:06 minutes elapsed
1500 cores, 3738 us latency, 0:23 minutes elapsed
1800 cores, 120680 us latency, 1:19 minutes elapsed
2000 cores, 7234 us latency, 2:16 minutes elapsed
2500 cores, 582650 us latency, 3:26 minutes elapsed
3500 cores, 619694 us latency, hadn't completed after 10 minutes (printed
the benchmark result after a few minutes but sat around until I killed it
after 10 minutes)

mvapich2-2.0.1:

1000 cores, 2.12 us latency, 0:07 minutes elapsed
1500 cores, 4016 us latency, 1:04 minutes elapsed
1800 cores, 4990 us latency, hadn't completed after 10 minutes (printed the
benchmark result after a few minutes but sat around until I killed it after
10 minutes)
2000 cores, 7289 us latency, 2:06 minutes elapsed
2500 cores, 10498 us latency, 3:13 minutes elapsed
3500 cores, 34379 us latency, hadn't completed after 10 minutes (printed
the benchmark result after a few minutes but sat around until I killed it
after 10 minutes)


I can get to 28,000 cores with osu_gather (not osu_gatherV) using the same
options and mvapich2 2.0.1 (I haven't tried 2.1rc1) with 5.07us latency and
the whole thing finishes in 3:15 minutes.

Any help would be much appreciated. Please let me know if there's any
additional information I can get that would be helpful.

Best,
Aaron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150112/52fcdd11/attachment-0001.html>


More information about the mvapich-discuss mailing list