[mvapich-discuss] Performance difference in MPI_Allreduce calls betweem MVAPICH2-GDR and OpenMPI

Yussuf Ali yussuf.ali at jaea.go.jp
Tue Jan 22 20:12:30 EST 2019


Dear MVAPICH developers and users,

 

in our software we noticed a performance degradation in the MPI_Allreduce
calls when using MVAPICH-GDR compared to OpenMPI. 

The software (Krylov solver) runs several iterations and in each iteration
data is reduced two times using MPI_Allreduce. 

The send and receive buffers are both allocated as device memory on the GPU.
We measured the total time of the MPI_Allreduce calls.

 

16 GPU case (V100)

 

MVAPICH2-GDR(2.3)

1. MPI_Allreduce :  0.27 seconds

2. MPI_Allreduce:  1.9 seconds

 

OpenMPI

1. MPI_Allreduce: 0.10 seconds

2. MPI_Allreduce; 0.19 seconds

 

The data sizes are:

1. MPI_Allreduce: 720 byte

2. MPI_Allreduce: 1,160 byte

 

Are there any parameters to tune the MPI_Allreduce performance in
MVAPICH-GDR?

 

Thank you for your help,

Yussuf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190123/d964f5d2/attachment.html>


More information about the mvapich-discuss mailing list