[mvapich-discuss] Best configure / environment settings for Mellanox QDR with RH6.6-native InfiniBand support?

Wed Jan 14 14:43:36 EST 2015

Dear Chris,

Here are some suggestions to gain best performancce out of MVAPICH2 for
your system. Please let us know if you face any performacne and
functionality issues and we will be glad to work with you on it.

Flags to remove
============
--enable-registration-cache - This is on by default. You do not need to
mention this
--with-pm=hydra  - Remove this. mpirun_rsh gives better startup performance
                 - mpirun_rsh is used by default (no config flags needed)
--enable-rdma-cm - Remove this and replace with "--disable-rdma-cm"
                 - You do not need RDMA_CM for IB based clusters

Flags to add
=========
--enable-mcast - Allows you to use InfiniBand HW multicast for MPI_Bcast,
MPI_Scatter and
                        MPI_Allreduce
                      - Add MV2_USE_MCAST=1 at runtime to activate
                      - Refer to the following section of the userguide for
more details on some system
                        requirements for multicast to work and how to
activate it at runtime

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc1-userguide.html#x1-600006.8

--enable-hybrid - Use a hybrid of InfiniBand transport protocols (RC, UD,
XRC) for communication.
                      - Refer to the following section of the userguide for
more
                        details on how to activate it at runtime

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc1-userguide.html#x1-630006.11

You also mentioned that you were getting poor intra-node performance. Are
your
codes pure MPI codes or MPI+OpenMP? If you are using MPI+OpenMP, then it
could
be possible that there is oversubscription happening due to improper
mapping of
processes to cores. If this is the case, please run your application with
"MV2_ENABLE_AFFINITY=0". Please refer to the following section of the
userguide
for more details on how to use the proper core-mapping runtime.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc1-userguide.html#x1-540006.5

MVAPICH2-2.1rc1 has some performance optimizations for point-to-point
send/recv
operations. You should see performance benefits for message rate and
bandwidth
with MVAPICH2-2.1rc1 over MVAPICH2-2.0.1. You should also observe reduced
memory
footprint.

The following tutorial gives you several hints on how you can optimizate and
tune MPI and PGAS Applications using MVAPICH.

http://mug.mvapich.cse.ohio-state.edu/static/media/mug/presentations/2014/tutorial.pdf

Regards,
Hari.

On Wed, Jan 14, 2015 at 10:52 AM, Chris Green <greenc at fnal.gov> wrote:

>  Hi,
>
> This is a question separated out from my previous issue ("Compilation
> error for mvapich2-2.0.1 with disabled C++ bindings"), per Jonathan's
> suggestion.
>
> A scientific collaboration with which we work is using Mellanox QDR cards
> as part of a multi-node / multi-core data acquisition / processing chain
> developed by us using MPI, and we have development systems using them also.
> Until recently we have been using OFED1.5.4.1 with mvapich 1.9 on
> SLF6.3-ish (Scientific Linux Fermi is a RHEL variant), but we are switching
> to using the RHEL6.6-native InfiniBand drivers and support libraries and
> are therefore in the position of building mvapich ourselves (and providing
> recommendations on build and use thereof to our collaborators).
>
> Given that we know that the mvapich libraries will be linked to code
> compiled using compilers other than the system's native GCC (usually more
> modern versions of GCC), we had to choose between tying the mvapich build
> to a particular GCC version or deactivating the C++ bindings. Since we
> don't use them for this application, we chose the latter. Here then, is
> what we have for a configure command:
>
> ./configure --prefix=/usr/local/mvapich2-2.0.1 --enable-fast=O3,ndebug --enable-f77 --enable-fc \
>             --disable-cxx --enable-romio --enable-versioning --enable-threads=runtime --enable-registration-cache \
>             --enable-rsh --enable-shared --enable-static --enable-yield=sched_yield --enable-rdma-cm --with-pm=hydra
>
> Can anyone tell me if there is a better configuration for the use outlined
> above, or anything we should be doing by way of setting environment
> variables or other system configuration to get the best bandwidth? In the
> unenlightened past, we have been in the somewhat strange position of
> getting better inter-node bandwidth than intra-node, so I know what we were
> doing in the OFED era wasn't necessarily optimal. Our MPI use is generally
> centered around MPI_Isend() and MPI_Irecv(), if that is relevant.
>
> Thanks for any help you can give,
>
> Chris.
>
> --
> Chris Green <greenc at fnal.gov> <greenc at fnal.gov>, FNAL CS/SCD/ADSS/SSI/TAC;
> 'phone (630) 840-2167; Skype: chris.h.green;
> IM: greenc at jabber.fnal.gov, chissgreen (AIM, Yahoo),chissg at hotmail.com (MSNM), chris.h.green (Google Talk).
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150114/f8a3d268/attachment.html>