[mvapich-discuss] Non-MPI_THREAD-SINGLE mode with enabled MV2 affinity?

Thiago Quirino - NOAA Federal thiago.quirino at noaa.gov
Wed Nov 13 12:08:23 EST 2013


Thanks again, Jonathan.

Quick question. Our supercomputing infrastructure uses Infiniband QDR. I've
heard that a single core of a node is capable of saturating the PCIe bus.
Using MVAPICH2, I've measured unidirectional transfer rates of up to 430
Mbytes/sec over Infiniband using a simple echo-like MPI program (I varied
the amount of data transferred from 10kbytes to 100Mbytes). Given that, is
there a benefit in terms of overall performance to using
MPI_THREAD_MULTIPLE to allow multiple threads to communicate simultaneously
versus using MPI_THREAD_SERIALIZE and forcing the threads to synchronize
around the MPI calls? Let's suppose that each thread uses a different
communicator to avoid message collisions.

Thank you!
Thiago.



On Tue, Nov 12, 2013 at 3:39 PM, Jonathan Perkins <
jonathan.lamar.perkins at gmail.com> wrote:

> Hi, sorry for the delay.  I talked with a few others in the group and
> we agreed on the following:
>
> According to the specification you must use either MPI_THREAD_MULTIPLE
> or MPI_THREAD_SERIALIZED with the implementation that you have
> described.
>
> In your case where the application serializes the MPI calls you should
> get better performance with MPI_THREAD_SERIALIZED compared to
> MPI_THREAD_MULTIPLE.
>
> If you use MPI_THREAD_MULTIPLE you may want to update your application
> to have concurrent MPI calls instead of serialized as some of the
> communication may be overlapped.
>
>
> On Tue, Nov 12, 2013 at 2:10 PM, Thiago Quirino - NOAA Federal
> <thiago.quirino at noaa.gov> wrote:
> > Thank you so much, Jonathan. I will try this out.
> >
> > Is there any advantage to using
> MPI_THREAD_{FUNNELLED,SERIALIZED,MULTIPLE}
> > versus MPI_THREAD_SINGLE for serialized MPI calls performed by multiple
> > threads (i.e. when all threads synchronize around the MPI calls)?
> >
> > Thanks again,
> > Thiago.
> >
> >
> >
> > On Tue, Nov 12, 2013 at 1:56 PM, Jonathan Perkins
> > <jonathan.lamar.perkins at gmail.com> wrote:
> >>
> >> Hello Thiago.  Perhaps you can try an alternative to
> >> MV2_ENABLE_AFFINITY.  If you use the hydra process manager
> >> (mpiexec.hydra), you can disable the library affinity and use the
> >> launcher affinity instead.  In this case the other threading levels
> >> will be available to you.
> >>
> >> Please see
> >>
> https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Process-core_Binding
> >> for more information on how to use this hydra feature.  Also please do
> >> not forget to set MV2_ENABLE_AFFINITY to 0.
> >>
> >> Please let us know if this helps.
> >>
> >> On Fri, Nov 8, 2013 at 6:41 PM, Thiago Quirino - NOAA Federal
> >> <thiago.quirino at noaa.gov> wrote:
> >> > Hi, folks. Quick question about MVAPICH2 and affinity support.
> >> >
> >> > Is it possible to invoke MPI_Init_thread with any mode other than
> >> > "MPI_THREAD_SINGLE" and still use "MV2_ENABLE_AFFINITY=1"? In my
> hybrid
> >> > application I mix MPI with raw Pthreads (not OpenMP). I start 4 MPI
> >> > tasks in
> >> > each 16 cores node, where each node has 2 sockets with 8 Sandybridge
> >> > cores
> >> > each. Each of the 4 MPI tasks then spawns 4 pthreads for a total of 16
> >> > pthreads/node, or 1 pthread/core. Within each MPI task, the MPI calls
> >> > are
> >> > serialized among the 4 pthreads, so I can use any MPI_THREAD_* mode,
> but
> >> > I
> >> > don't know which mode will work best. I want to assign each of the 4
> MPI
> >> > tasks in a node a set of 4 cores using MV2_CPU_MAPPING (e.g. export
> >> > MV2_CPU_MAPPING=0,1,2,3:4,5,6,7:8,9,10,11:12,13,14,15) so that the 4
> >> > pthreads spawned by each MPI task can migrate to any processor within
> >> > its
> >> > exclusive CPU set of size 4.
> >> >
> >> > Is that possible with modes other than MPI_THREAD_SINGLE? If not, do
> you
> >> > foresee any issues with using MPI_THREAD_SINGLE while serializing the
> >> > MPI
> >> > calls among the 4 pthreads of each MPI task? That is, is there any
> >> > advantage
> >> > to using MPI_THREAD_FUNELLED or MPI_THREAD_SERIALIZED versus
> >> > MPI_THREAD_SINGLE for serialized calls among pthreads?
> >> >
> >> > Thank you so much, folks. Any help is much appreciated.
> >> >
> >> > Best,
> >> > Thiago.
> >> >
> >> >
> >> > ---------------------------------------------------
> >> > Thiago Quirino, Ph.D.
> >> > NOAA Hurricane Research Division
> >> > 4350 Rickenbacker Cswy.
> >> > Miami, FL 33139
> >> > P: 305-361-4503
> >> > E: Thiago.Quirino at noaa.gov
> >> >
> >> > _______________________________________________
> >> > mvapich-discuss mailing list
> >> > mvapich-discuss at cse.ohio-state.edu
> >> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Perkins
> >
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
>
>
> --
> Jonathan Perkins
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131113/1c252789/attachment.html>


More information about the mvapich-discuss mailing list