[mvapich-discuss] Single node performance issue

Hari Subramoni subramoni.1 at osu.edu
Mon Oct 24 18:54:54 EDT 2016


Hello,

Sorry to hear that you're getting performance variations. Looks like there
are a few issues here.

Please do not use "--with-device=ch3:nemesis:ib" as it is not the default
communication channel. You don't need to specify any extra configure
options to build for the InfiniBand channel. Please look at the following
section of the userguide for more information on this. Having said that,
the flags you've used will not negatively affect it.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2-userguide.html#x1-120004.4

Can you try running the single node case after disabling shared memory
support (MV2_USE_SHARED_MEM=0)?

Regards,
Hari.

On Mon, Oct 24, 2016 at 10:25 AM, Mayer, Benjamin W. <mayerbw at ornl.gov>
wrote:

> We are seeing a performance issue with MVAPICH2 2.2 and 2.2rc1 on a single
> node, and worse on multiple nodes. This behavior is not seeing while using
> OpenMPI.
>
>
>
> For all data running the OSU Allgather microbenchmark with sample sizes of
> 50-10 instances. MVAPICH has been compiled and tested with Intel
> 2017.0.098, Intel 2016.1, GNU 5.3.0. The machine has Mellanox CX4 adaptors.
>
>
>
> With MVAPICH2 2.2 on a single node, 32 tasks, 1 thread, we see some high
> variability on small data sizes (~50 samples). There is a large percentage
> of runs that are normal run time (4-5 us), but also a moderate number of
> the runs that are up to 2x slower (10us), and then a handful of extreme
> outliers (10,000us) for each data size, along with a small number that are
> killed because they are not finishing.
>
>
>
> MVAPICH2 2.2rc1 in the same configuration (80 samples) has similar
> behavior except has about 4x the rate of extreme outliers, and is generally
> a bit slower once outliers are removed.
>
>
>
> OpenMPI in the same configuration (100 samples) has no outliers and
> expected level of performance.
>
>
>
> Small numbers of runs have been performed across 32 nodes. MVAPICH2 2.2
> performance has been much worse. For example at 16k data size the time was
> 134,000 us.
>
>
>
> For the above, I have raw data and plots that I can share if those would
> be helpful.
>
>
>
> The configuration for 2.2rc1:
>
> ./configure --prefix=$SW_BLDDIR \
>
> --with-pbs=/opt/torque
>
> --enable-fortran=yes \
>
> --enable-cxx \
>
> --with-device=ch3:mrail \
>
> --with-rdma=gen2
>
>
>
> The configuration for 2.2:
>
> ./configure --prefix=$SW_BLDDIR \
>
> --with-pbs=/opt/torque \
>
> --with-pm=hydra \
>
> --with-device=ch3:mrail \
>
> --with-rdma=gen2 \
>
> --with-hwloc \
>
>
>
> We have tried a new configuration with 2.2 to try to explicitly call out
> the IB interface.
>
> ./configure --prefix=$SW_BLDDIR \
>
> --with-pbs=/opt/torque \
>
> --with-pm=hydra \
>
> --with-device=ch3:nemesis:ib \
>
> --with-hwloc \
>
> --with-rdma=gen2
>
>
>
> This configuration ends with the application having a bus error.
>
> ============================================================
> =======================
>
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>
> =   PID 42148 RUNNING AT mod-pbs-c01.ornl.gov
>
> =   EXIT CODE: 7
>
> =   CLEANING UP REMAINING PROCESSES
>
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ============================================================
> =======================
>
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7)
>
> This typically refers to a problem with your application.
>
> Please see the FAQ page for debugging suggestions
>
>
>
> - What is the likely solution of the single node performance issue?
>
> - What configuration should be given to use the IB adaptors?​
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161024/8d5a1a1f/attachment-0001.html>


More information about the mvapich-discuss mailing list