[mvapich-discuss] Performance difference in MPI_Allreduce calls betweem MVAPICH2-GDR and OpenMPI
Yussuf Ali
yussuf.ali at jaea.go.jp
Tue Jan 22 20:12:30 EST 2019
Dear MVAPICH developers and users,
in our software we noticed a performance degradation in the MPI_Allreduce
calls when using MVAPICH-GDR compared to OpenMPI.
The software (Krylov solver) runs several iterations and in each iteration
data is reduced two times using MPI_Allreduce.
The send and receive buffers are both allocated as device memory on the GPU.
We measured the total time of the MPI_Allreduce calls.
16 GPU case (V100)
MVAPICH2-GDR(2.3)
1. MPI_Allreduce : 0.27 seconds
2. MPI_Allreduce: 1.9 seconds
OpenMPI
1. MPI_Allreduce: 0.10 seconds
2. MPI_Allreduce; 0.19 seconds
The data sizes are:
1. MPI_Allreduce: 720 byte
2. MPI_Allreduce: 1,160 byte
Are there any parameters to tune the MPI_Allreduce performance in
MVAPICH-GDR?
Thank you for your help,
Yussuf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190123/d964f5d2/attachment.html>
More information about the mvapich-discuss
mailing list