[mvapich-discuss] Regular OFED vs MLNX OFED on systems with P100 GPUs
Subramoni, Hari
subramoni.1 at osu.edu
Tue Feb 27 11:14:08 EST 2018
Hello,
GDRCopy is meant for the very small message range. So the behavior is expected.
GPUDirectRDMA and GDRCopy are two different technologies. One is not needed for the other to work. Both technologies needs certain drivers from NVIDIA and Mellanox to be installed. Without these being installed, they will not work and thus MVAPICH2-GDR will not be able to take advantage of them. To the best of our knowledge, GPUDirectRDMA will need MLNX_OFED and will not work with non MLNX_OFED.
>From your performance numbers, it looks like you have the GDRCopy module installed hence MVAPICH2-GDR is able to take advantage of GDRCopy to deliver better performance for the smaller message range.
Please let me know if you have any other questions.
Thx,
Hari.
From: mvapich-discuss-bounces at cse.ohio-state.edu On Behalf Of Raghu Reddy
Sent: Tuesday, February 27, 2018 9:16 AM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] Regular OFED vs MLNX OFED on systems with P100 GPUs
In the context of our cluster configuration:
Hardware:
Intel Haswell processors, 2 sockets/node (20 cores/node)
8 P100 GPUs - 4 GPUs connected to socket 0 and 4 GPUs connected to socket 1
Single rail MLNX QDR fabric connected to socket 1
Software:
Running RHEL 7.4
Using stock OFED (Later there is a question about whether MLNX OFED is required)
Intel 18.1 compiler
Mvapich2-GDR/2.2-4 Intel version
As an aside, the reason for sticking with stock OFED is because we have a mixed environment; non-GPU part of the machine with about 1K nodes has Intel TrueScale fabric, and we have about 100 nodes with MLNX fabric, and we would prefer to have a single image for all the nodes.
I am looking at the following documentation, specifically section 7:
http://mvapich.cse.ohio-state.edu/userguide/gdr/2.2/
>From my understanding, since P100 GPUs have the GDR capability, it is not necessary to use GPUDIRECT, is that a correct statement? Or is GDRCOPY necessary even for nodes with these latest GPUs?
But I am seeing some differences with and without MV2_USE_GPUDIRECT_GDRCOPY being set. I am including the output from two runs, one with this variable set to 1 and one to 0, and I have pasted output from osu_bw:
sg001% paste osu_bw.out-gdrcopy-0 osu_bw.out-gdrcopy-1
1 0.03 1 0.75
2 0.06 2 1.41
4 0.12 4 3.03
8 0.24 8 5.61
16 0.47 16 7.26
32 0.94 32 0.99
64 1.89 64 1.99
128 3.77 128 3.92
256 7.55 256 7.91
512 15.05 512 15.77
1024 29.99 1024 31.41
2048 59.58 2048 62.50
4096 118.58 4096 124.17
8192 421.15 8192 380.02
16384 451.32 16384 966.68
32768 1330.50 32768 1545.72
65536 1835.08 65536 2043.06
131072 2148.80 131072 2283.34
262144 1922.57 262144 1947.62
524288 3747.85 524288 3823.84
1048576 3810.02 1048576 3844.70
2097152 3838.38 2097152 3856.94
4194304 3853.08 4194304 3862.78
sg001%
For long messages there is no significant difference in performance, but for smaller messages there is quite a bit of difference. Is this what is expected?
Similar question about regular OFED from the same section, since these are newer GPUs, is the mvapich2-gdr library capable of taking advantage of GDR capability even without the MLNX OFED?
Not sure if the second question is something that should be put to either Red Hat support or MLNX support?
We are trying to determine if we have a problem in our hardware/software configuration or of this is what is expected.
We appreciate any comments and suggestions about our observations above!
Thanks,
Raghu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 18770 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180227/3a731b93/attachment-0001.bin>
More information about the mvapich-discuss
mailing list