[mvapich-discuss] Optimization flags for large messages and asynchronous progress

Krishna Kandalla kandalla at cse.ohio-state.edu
Fri Jan 4 10:19:38 EST 2013


Hello Amirreza,
       Thanks for reporting this issue. Could you please let us know if you
are using all of the 32 cores per node? You could use the parameters
MPICH_ASYNC_PROGRESS=1
and MV2_ENABLE_AFFINITY=0 with mvapich2-1.7 to enable the async-progress
feature. However, this involves each process creating its own thread to
progress the communication and you may need to have sufficient number of
idle cores.
       Regarding your current set of flags, we noticed that your
smp-eagersize is quite high. We do not usually recommend such a large
value. You may use the MV2_RNDV_PROTOCOL env variable in MV2 to change the
rndv protocol. This can be set to either RPUT, RGET or R3. (Please refer to
the following link for more information regarding this parameter:
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a2.html#x1-20500011.56
)

Thanks,
Krishna



On Fri, Jan 4, 2013 at 1:44 AM, Amirreza Rastegari <amirreza at umich.edu>wrote:

> Hi,
>
> I'm a user of sdsc trestles (a cluster of SMP nodes, AMD Magny cores, 32
> core per node, 8 cores per socket, system specs are listed here:
> http://www.sdsc.edu/us/resources/trestles/) . I'm having some performance
> issues with a  code that is used for turbulent channel flow simulations.
> After analysing the performance I have found the bottleneck is in the
> communications. In the code I have tried to overlap the communications with
> the computations, however it seems that the asynchronous progress is not
> supported with the mpi library. The default mpi implementation on trestles
> is mvapich2/1.5.1p1 (which I have used it for the tests) however
> mvapich2/1.7 also is accessible. Is it possible for you to help me? is
> there any flag that I'm missing? can you please help me to select the
> optimized flags for my application?
>
> Currently in the code I use persistent communication in ready mode
> (MPI_Rsend_init) and in between the MPI_Start_all and MPI_Wait_all bulk of
> the computation is done. The code looks like this:
>
> void main(int argc, char *argv[]){
>   some definitions;
>   some initializations;
>
>   MPI_Init(&argc, &argv);
>
>   MPI_Rsend_init( channel to the rank before );
>   MPI_Rsend_init( channel to the rank after );
>   MPI_Recv_init( channel to the rank before );
>   MPI_Recv_init( channel to the rank after );
>
>   for (timestep=0; temstep<Time; timestep++)
>   {
>     prepare data for send;
>     MPI_Start_all();
>
>     do computations;
>
>     MPI_Wait_all();
>
>     do work on the received data;
>   }
>   MPI_Finalize();}
>
> Unfortunately the actual data transfer does not start until the
> computations are done, I don't understand why. The network uses QDR
> InfiniBand Interconnect, each message size is 23MB (totally 46 MB message
> is sent, one 23MB to the next rank and one to the previous), ranks all are
> in a loop (i.e. ranke N communicates with rank 1 and N-1) and I need to
> increase the size of the problem to extend my studies which means messages
> as large as 92MB
>
> Currently I use the following flags:
> VIADEV_RNDV_PROTOCOL=ASYNC
> MV2_SMP_EAGERSIZE=46M
> MV2_CPU_BINDING_LEVEL=socket
> MV2_CPU_BINDING_POLICY=bunch
>
> Can you please help me to select the flags? Is there a set of optimum
> flags for such applications? Also, should I use persistent communication or
> I should use MPI_Isend and a regular MPI_Recv or MPI_Send and MPI_Irecv or
> MPI_Isend and MPI_Irecv? Is there a specific combination for which the
> asynchronous progress  works? Do you recommend I create a thread that just
> repeatedly calls MPI_Test on each core?
>
> Thank you very much,
> Amirreza
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130104/178eed99/attachment.html


More information about the mvapich-discuss mailing list