[mvapich-discuss] Best configure / environment settings for Mellanox QDR with RH6.6-native InfiniBand support?

Hari Subramoni subramoni.1 at osu.edu
Mon Apr 6 14:33:57 EDT 2015


Dear Filippo,

Apart from what I mentioned earlier, you could set MV2_NUM_PORTS=2 to
improve the bandwidth. Please refer to the following section of the
userguide for more information about this parameter.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-19100011.33

These flags are compatible with different thread granularities.

We would recommend to use mpirun_rsh for best startup performance. In the
2.1 release, we've made significant enhancements in startup performance.
You should see near constant time in spent in MPI_Init as the size of the
job increases.

I will get back to you soon about the MPIT variables.

Best Regards,
Hari.

On Mon, Apr 6, 2015 at 6:24 AM, Filippo Spiga <spiga.filippo at gmail.com>
wrote:

> Dear Hari,
>
> I resume this old conversation because of curiosity, I have few additional
> questions...
>
> 1. Which flags do you suggest for Connect-IB FDR dual-rail cluster like
> Wilkes?
> 2. Are those flags all compatible with "--enable-threads=multiple|runtime"?
> 3. By enabling "--enable-mpit-pvars=" in MV2-2.1, which level of control
> is possible to achieve in term of behavior of the MPI library?
> 4. What is your opinion about mpirun_rsh/hydra/MPI-2 in term of startup
> performance?
>
> Thanks in advance
>
> F
>
> On Jan 14, 2015, at 7:43 PM, Hari Subramoni <subramoni.1 at osu.edu> wrote:
> > Dear Chris,
> >
> > Here are some suggestions to gain best performancce out of MVAPICH2 for
> your system. Please let us know if you face any performacne and
> functionality issues and we will be glad to work with you on it.
> >
> > Flags to remove
> > ============
> > --enable-registration-cache - This is on by default. You do not need to
> mention this
> > --with-pm=hydra  - Remove this. mpirun_rsh gives better startup
> performance
> >                  - mpirun_rsh is used by default (no config flags needed)
> > --enable-rdma-cm - Remove this and replace with "--disable-rdma-cm"
> >                  - You do not need RDMA_CM for IB based clusters
> >
> > Flags to add
> > =========
> > --enable-mcast - Allows you to use InfiniBand HW multicast for
> MPI_Bcast, MPI_Scatter and
> >                         MPI_Allreduce
> >                       - Add MV2_USE_MCAST=1 at runtime to activate
> >                       - Refer to the following section of the userguide
> for more details on some system
> >                         requirements for multicast to work and how to
> activate it at runtime
> >
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc1-userguide.html#x1-600006.8
> >
> > --enable-hybrid - Use a hybrid of InfiniBand transport protocols (RC,
> UD, XRC) for communication.
> >                       - Refer to the following section of the userguide
> for more
> >                         details on how to activate it at runtime
> >
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc1-userguide.html#x1-630006.11
> >
> > You also mentioned that you were getting poor intra-node performance.
> Are your
> > codes pure MPI codes or MPI+OpenMP? If you are using MPI+OpenMP, then it
> could
> > be possible that there is oversubscription happening due to improper
> mapping of
> > processes to cores. If this is the case, please run your application with
> > "MV2_ENABLE_AFFINITY=0". Please refer to the following section of the
> userguide
> > for more details on how to use the proper core-mapping runtime.
> >
> >
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1rc1-userguide.html#x1-540006.5
> >
> > MVAPICH2-2.1rc1 has some performance optimizations for point-to-point
> send/recv
> > operations. You should see performance benefits for message rate and
> bandwidth
> > with MVAPICH2-2.1rc1 over MVAPICH2-2.0.1. You should also observe
> reduced memory
> > footprint.
> >
> > The following tutorial gives you several hints on how you can optimizate
> and
> > tune MPI and PGAS Applications using MVAPICH.
> >
> >
> http://mug.mvapich.cse.ohio-state.edu/static/media/mug/presentations/2014/tutorial.pdf
> >
> > Regards,
> > Hari.
> >
> > On Wed, Jan 14, 2015 at 10:52 AM, Chris Green <greenc at fnal.gov> wrote:
> > Hi,
> >
> > This is a question separated out from my previous issue ("Compilation
> error for mvapich2-2.0.1 with disabled C++ bindings"), per Jonathan's
> suggestion.
> >
> > A scientific collaboration with which we work is using Mellanox QDR
> cards as part of a multi-node / multi-core data acquisition / processing
> chain developed by us using MPI, and we have development systems using them
> also. Until recently we have been using OFED1.5.4.1 with mvapich 1.9 on
> SLF6.3-ish (Scientific Linux Fermi is a RHEL variant), but we are switching
> to using the RHEL6.6-native InfiniBand drivers and support libraries and
> are therefore in the position of building mvapich ourselves (and providing
> recommendations on build and use thereof to our collaborators).
> >
> > Given that we know that the mvapich libraries will be linked to code
> compiled using compilers other than the system's native GCC (usually more
> modern versions of GCC), we had to choose between tying the mvapich build
> to a particular GCC version or deactivating the C++ bindings. Since we
> don't use them for this application, we chose the latter. Here then, is
> what we have for a configure command:
> > ./configure --prefix=/usr/local/mvapich2-2.0.1 --enable-fast=O3,ndebug
> --enable-f77 --enable-fc \
> >             --disable-cxx --enable-romio --enable-versioning
> --enable-threads=runtime --enable-registration-cache \
> >             --enable-rsh --enable-shared --enable-static
> --enable-yield=sched_yield --enable-rdma-cm --with-pm=hydra
> >
> > Can anyone tell me if there is a better configuration for the use
> outlined above, or anything we should be doing by way of setting
> environment variables or other system configuration to get the best
> bandwidth? In the unenlightened past, we have been in the somewhat strange
> position of getting better inter-node bandwidth than intra-node, so I know
> what we were doing in the OFED era wasn't necessarily optimal. Our MPI use
> is generally centered around MPI_Isend() and MPI_Irecv(), if that is
> relevant.
> >
> > Thanks for any help you can give,
> >
> > Chris.
> > --
> > Chris Green
> > <greenc at fnal.gov>
> > , FNAL CS/SCD/ADSS/SSI/TAC;
> > 'phone
> > (630) 840-2167
> > ; Skype: chris.h.green;
> > IM:
> > greenc at jabber.fnal.gov
> > , chissgreen (AIM, Yahoo),
> >
> > chissg at hotmail.com
> >  (MSNM), chris.h.green (Google Talk).
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> --
> Mr. Filippo SPIGA, M.Sc.
> http://filippospiga.info ~ skype: filippo.spiga
>
> «Nobody will drive us out of Cantor's paradise.» ~ David Hilbert
>
> *****
> Disclaimer: "Please note this message and any attachments are CONFIDENTIAL
> and may be privileged or otherwise protected from disclosure. The contents
> are not to be disclosed to anyone other than the addressee. Unauthorized
> recipients are requested to preserve this confidentiality and to advise the
> sender immediately of any error in transmission."
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150406/5d6b7115/attachment-0001.html>


More information about the mvapich-discuss mailing list