[mvapich-discuss] Performance difference in MPI_Allreduce calls betweem MVAPICH2-GDR and OpenMPI
Yussuf Ali
yussuf.ali at jaea.go.jp
Thu Jan 24 00:25:46 EST 2019
Dear Ammar,
thank you for your email!
No it is not a DGX-2 system, we are using the ABCI supercomputer, the exact specification can be found here https://abci.ai/en/about_abci/computing_resource.html
At the moment we are not able to provide you access to the ABCI system because it is not our own. But our organization
has purchased a DGX- 2 system which should be delivered within the next three months. At that time we may be able to provide you access to our DGX-2 system.
I have another question regarding the OSU benchmark. I executed the osu_bibw benchmark on the ABCI system for MVAPICH2-GDR(2.3), Intel MPI and OpenMPI
for Host to Host (H H) communication for the inter node case.
MVAPICH2 shows a much higher bandwidth for the large messages than Intel or OpenMPI. Are these results correct or do we have a setup error in our benchmarking test?
Size GDR(2.3) Intel OpenMPI
1 0.25 7.28 0
2 1.29 14.95 0.01
4 14.29 28.9 0.02
8 26.32 63.53 0.05
16 57.4 128.32 0.08
32 113.45 234.19 0.17
64 228.58 458.13 0.38
128 461.36 855.57 502.7
256 847.55 1583.76 960.93
512 1682.65 2837.01 1856.56
1024 3036.65 4750.74 3270.54
2048 5136.46 7119.34 5138.45
4096 7392.42 9262.95 7380.5
8192 9936.17 11643.05 8366.76
16384 11173.19 12779.45 308.49
32768 19337.39 13080.18 18678.61
65536 22878.94 12942.33 21546.44
131072 23815.06 12821.71 22481.7
262144 24305.1 15569.08 22718.96
524288 47901.26 18774.32 22937.39
1048576 48697.25 20891.05 23036.87
2097152 49069.72 22002.18 23098.95
4194304 49043.22 22557.32 23131.02
Thank you for your help,
Yussuf
-----Original Message-----
From: Awan, Ammar Ahmad [mailto:awan.10 at buckeyemail.osu.edu]
Sent: Thursday, January 24, 2019 4:14 AM
To: Yussuf Ali <yussuf.ali at jaea.go.jp>
Cc: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: Re: [mvapich-discuss] Performance difference in MPI_Allreduce calls betweem MVAPICH2-GDR and OpenMPI
===( By JAEA Mail System )===============================
URL中の文字「:」を「*」に置換しました。
Characters of ":" in URL have been replaced with "*".
=========================================================
Hi Yussuf,
Sorry to hear that you are seeing performance degradation. I have a few questions and suggestions.
Can you kindly let us know if this is a DGX-2 system? If not, please share some more details like the GPU topology and the availability of NVLink(s) on your system.
We have some new designs for the DGX-2 system that will be available in the next MVAPICH2-GDR release. The new designs provide much better performance.
In the meantime, is it possible for us to get access to your system? This will enable us to help you in a better and faster manner.
Thanks,
Ammar
On Tue, Jan 22, 2019 at 8:13 PM Yussuf Ali <yussuf.ali at jaea.go.jp<mailto:yussuf.ali at jaea.go.jp>> wrote:
Dear MVAPICH developers and users,
in our software we noticed a performance degradation in the MPI_Allreduce calls when using MVAPICH-GDR compared to OpenMPI.
The software (Krylov solver) runs several iterations and in each iteration data is reduced two times using MPI_Allreduce.
The send and receive buffers are both allocated as device memory on the GPU. We measured the total time of the MPI_Allreduce calls.
16 GPU case (V100)
MVAPICH2-GDR(2.3)
1. MPI_Allreduce : 0.27 seconds
2. MPI_Allreduce: 1.9 seconds
OpenMPI
1. MPI_Allreduce: 0.10 seconds
2. MPI_Allreduce; 0.19 seconds
The data sizes are:
1. MPI_Allreduce: 720 byte
2. MPI_Allreduce: 1,160 byte
Are there any parameters to tune the MPI_Allreduce performance in MVAPICH-GDR?
Thank you for your help,
Yussuf
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http*//mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list