[mvapich-discuss] Slow MV2_USE_SHMEM_COLL

Fri Oct 9 11:38:53 EDT 2015

Hello,

I have been trying to diagnose what causes a huge slow down of one part of
our application between MVAPICH2 1.9 and 2.0.1 and eventually came up with
a test case that simply calls MPI_Gather of 16 MPI_CHARACTERS to process 0.
Timings over 51 iterations are the following:

(setenv MV2_USE_SHMEM_COLL 1)
$ mpirun_rsh -np 3000 -hostfile nodes ./a.out
took 20.7183160781860 seconds 0.406241491729138 per gather

(setenv MV2_USE_SHMEM_COLL 0)
$ mpirun_rsh -np 3000 -hostfile nodes ./a.out
took 2.943396568298340E-002 seconds 5.771365820192823E-004 per gather

Interestingly, if hostfile file contains unique host names (by default we
use node1 repeated 'cpu cores' times followed by node2 etc.) the
observation does not hold - both Gathers are fast.

The problem does not seem to appear before the release 2.0.1. Easy solution
would be to disable collective shared memory optimizations or use unique
host lists however both solutions slow down different parts of the
application, on average exactly offsetting the benefits.

Please let me know if you would like to know any details of our cluster
environment (24 core Xeons with QLogic's InfiniBand). I would be really
grateful if you can share any ideas and/or solutions to what could be
causing our problems and help us achieve optimal performance.

Thank you.

Regards,
Marcin Rogowski
Saudi Aramco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151009/3a268a2a/attachment.html>