[mvapich-discuss] Best configure / environment settings for Mellanox QDR with RH6.6-native InfiniBand support?

Filippo Spiga spiga.filippo at gmail.com
Mon Apr 6 06:24:56 EDT 2015


Dear Hari,

I resume this old conversation because of curiosity, I have few additional questions...

1. Which flags do you suggest for Connect-IB FDR dual-rail cluster like Wilkes? 
2. Are those flags all compatible with "--enable-threads=multiple|runtime"?
3. By enabling "--enable-mpit-pvars=" in MV2-2.1, which level of control is possible to achieve in term of behavior of the MPI library?
4. What is your opinion about mpirun_rsh/hydra/MPI-2 in term of startup performance?

Thanks in advance

F

On Jan 14, 2015, at 7:43 PM, Hari Subramoni <subramoni.1 at osu.edu> wrote:
> Dear Chris,
> 
> Here are some suggestions to gain best performancce out of MVAPICH2 for your system. Please let us know if you face any performacne and functionality issues and we will be glad to work with you on it.
> 
> Flags to remove
> ============
> --enable-registration-cache - This is on by default. You do not need to mention this
> --with-pm=hydra  - Remove this. mpirun_rsh gives better startup performance
>                  - mpirun_rsh is used by default (no config flags needed)
> --enable-rdma-cm - Remove this and replace with "--disable-rdma-cm"
>                  - You do not need RDMA_CM for IB based clusters
> 
> Flags to add
> =========
> --enable-mcast - Allows you to use InfiniBand HW multicast for MPI_Bcast, MPI_Scatter and 
>                         MPI_Allreduce
>                       - Add MV2_USE_MCAST=1 at runtime to activate
>                       - Refer to the following section of the userguide for more details on some system 
>                         requirements for multicast to work and how to activate it at runtime
>                          http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc1-userguide.html#x1-600006.8
> 
> --enable-hybrid - Use a hybrid of InfiniBand transport protocols (RC, UD, XRC) for communication.
>                       - Refer to the following section of the userguide for more
>                         details on how to activate it at runtime
>                         http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc1-userguide.html#x1-630006.11
> 
> You also mentioned that you were getting poor intra-node performance. Are your
> codes pure MPI codes or MPI+OpenMP? If you are using MPI+OpenMP, then it could
> be possible that there is oversubscription happening due to improper mapping of
> processes to cores. If this is the case, please run your application with
> "MV2_ENABLE_AFFINITY=0". Please refer to the following section of the userguide
> for more details on how to use the proper core-mapping runtime.
> 
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc1-userguide.html#x1-540006.5
> 
> MVAPICH2-2.1rc1 has some performance optimizations for point-to-point send/recv
> operations. You should see performance benefits for message rate and bandwidth
> with MVAPICH2-2.1rc1 over MVAPICH2-2.0.1. You should also observe reduced memory
> footprint.
> 
> The following tutorial gives you several hints on how you can optimizate and
> tune MPI and PGAS Applications using MVAPICH.
> 
> http://mug.mvapich.cse.ohio-state.edu/static/media/mug/presentations/2014/tutorial.pdf
> 
> Regards,
> Hari.
> 
> On Wed, Jan 14, 2015 at 10:52 AM, Chris Green <greenc at fnal.gov> wrote:
> Hi,
> 
> This is a question separated out from my previous issue ("Compilation error for mvapich2-2.0.1 with disabled C++ bindings"), per Jonathan's suggestion.
> 
> A scientific collaboration with which we work is using Mellanox QDR cards as part of a multi-node / multi-core data acquisition / processing chain developed by us using MPI, and we have development systems using them also. Until recently we have been using OFED1.5.4.1 with mvapich 1.9 on SLF6.3-ish (Scientific Linux Fermi is a RHEL variant), but we are switching to using the RHEL6.6-native InfiniBand drivers and support libraries and are therefore in the position of building mvapich ourselves (and providing recommendations on build and use thereof to our collaborators).
> 
> Given that we know that the mvapich libraries will be linked to code compiled using compilers other than the system's native GCC (usually more modern versions of GCC), we had to choose between tying the mvapich build to a particular GCC version or deactivating the C++ bindings. Since we don't use them for this application, we chose the latter. Here then, is what we have for a configure command:
> ./configure --prefix=/usr/local/mvapich2-2.0.1 --enable-fast=O3,ndebug --enable-f77 --enable-fc \
>             --disable-cxx --enable-romio --enable-versioning --enable-threads=runtime --enable-registration-cache \
>             --enable-rsh --enable-shared --enable-static --enable-yield=sched_yield --enable-rdma-cm --with-pm=hydra
> 
> Can anyone tell me if there is a better configuration for the use outlined above, or anything we should be doing by way of setting environment variables or other system configuration to get the best bandwidth? In the unenlightened past, we have been in the somewhat strange position of getting better inter-node bandwidth than intra-node, so I know what we were doing in the OFED era wasn't necessarily optimal. Our MPI use is generally centered around MPI_Isend() and MPI_Irecv(), if that is relevant.
> 
> Thanks for any help you can give,
> 
> Chris.
> -- 
> Chris Green 
> <greenc at fnal.gov>
> , FNAL CS/SCD/ADSS/SSI/TAC;
> 'phone 
> (630) 840-2167
> ; Skype: chris.h.green;
> IM: 
> greenc at jabber.fnal.gov
> , chissgreen (AIM, Yahoo),
> 
> chissg at hotmail.com
>  (MSNM), chris.h.green (Google Talk).
> 
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

--
Mr. Filippo SPIGA, M.Sc.
http://filippospiga.info ~ skype: filippo.spiga

«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert

*****
Disclaimer: "Please note this message and any attachments are CONFIDENTIAL and may be privileged or otherwise protected from disclosure. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality and to advise the sender immediately of any error in transmission."





More information about the mvapich-discuss mailing list