[mvapich-discuss] Single node performance issue

Mayer, Benjamin W. mayerbw at ornl.gov
Mon Oct 24 10:25:04 EDT 2016


We are seeing a performance issue with MVAPICH2 2.2 and 2.2rc1 on a single node, and worse on multiple nodes. This behavior is not seeing while using OpenMPI.



For all data running the OSU Allgather microbenchmark with sample sizes of 50-10 instances. MVAPICH has been compiled and tested with Intel 2017.0.098, Intel 2016.1, GNU 5.3.0. The machine has Mellanox CX4 adaptors.



With MVAPICH2 2.2 on a single node, 32 tasks, 1 thread, we see some high variability on small data sizes (~50 samples). There is a large percentage of runs that are normal run time (4-5 us), but also a moderate number of the runs that are up to 2x slower (10us), and then a handful of extreme outliers (10,000us) for each data size, along with a small number that are killed because they are not finishing.



MVAPICH2 2.2rc1 in the same configuration (80 samples) has similar behavior except has about 4x the rate of extreme outliers, and is generally a bit slower once outliers are removed.



OpenMPI in the same configuration (100 samples) has no outliers and expected level of performance.



Small numbers of runs have been performed across 32 nodes. MVAPICH2 2.2 performance has been much worse. For example at 16k data size the time was 134,000 us.



For the above, I have raw data and plots that I can share if those would be helpful.



The configuration for 2.2rc1:

./configure --prefix=$SW_BLDDIR \

--with-pbs=/opt/torque

--enable-fortran=yes \

--enable-cxx \

--with-device=ch3:mrail \

--with-rdma=gen2



The configuration for 2.2:

./configure --prefix=$SW_BLDDIR \

--with-pbs=/opt/torque \

--with-pm=hydra \

--with-device=ch3:mrail \

--with-rdma=gen2 \

--with-hwloc \



We have tried a new configuration with 2.2 to try to explicitly call out the IB interface.

./configure --prefix=$SW_BLDDIR \

--with-pbs=/opt/torque \

--with-pm=hydra \

--with-device=ch3:nemesis:ib \

--with-hwloc \

--with-rdma=gen2



This configuration ends with the application having a bus error.

===================================================================================

=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES

=   PID 42148 RUNNING AT mod-pbs-c01.ornl.gov

=   EXIT CODE: 7

=   CLEANING UP REMAINING PROCESSES

=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

===================================================================================

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7)

This typically refers to a problem with your application.

Please see the FAQ page for debugging suggestions



- What is the likely solution of the single node performance issue?

- What configuration should be given to use the IB adaptors??

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161024/5de85595/attachment.html>


More information about the mvapich-discuss mailing list