[mvapich-discuss] Regular OFED vs MLNX OFED on systems with P100 GPUs
Raghu Reddy
raghu.reddy at noaa.gov
Tue Feb 27 10:16:18 EST 2018
In the context of our cluster configuration:
Hardware:
Intel Haswell processors, 2 sockets/node (20 cores/node)
8 P100 GPUs - 4 GPUs connected to socket 0 and 4 GPUs
connected to socket 1
Single rail MLNX QDR fabric connected to socket 1
Software:
Running RHEL 7.4
Using stock OFED (Later there is a question about whether
MLNX OFED is required)
Intel 18.1 compiler
Mvapich2-GDR/2.2-4 Intel version
As an aside, the reason for sticking with stock OFED is because we have a
mixed environment; non-GPU part of the machine with about 1K nodes has Intel
TrueScale fabric, and we have about 100 nodes with MLNX fabric, and we would
prefer to have a single image for all the nodes.
I am looking at the following documentation, specifically section 7:
http://mvapich.cse.ohio-state.edu/userguide/gdr/2.2/
>From my understanding, since P100 GPUs have the GDR capability, it is not
necessary to use GPUDIRECT, is that a correct statement? Or is GDRCOPY
necessary even for nodes with these latest GPUs?
But I am seeing some differences with and without MV2_USE_GPUDIRECT_GDRCOPY
being set. I am including the output from two runs, one with this variable
set to 1 and one to 0, and I have pasted output from osu_bw:
sg001% paste osu_bw.out-gdrcopy-0 osu_bw.out-gdrcopy-1
1 0.03 1 0.75
2 0.06 2 1.41
4 0.12 4 3.03
8 0.24 8 5.61
16 0.47 16 7.26
32 0.94 32 0.99
64 1.89 64 1.99
128 3.77 128 3.92
256 7.55 256 7.91
512 15.05 512 15.77
1024 29.99 1024 31.41
2048 59.58 2048 62.50
4096 118.58 4096 124.17
8192 421.15 8192 380.02
16384 451.32 16384 966.68
32768 1330.50 32768 1545.72
65536 1835.08 65536 2043.06
131072 2148.80 131072 2283.34
262144 1922.57 262144 1947.62
524288 3747.85 524288 3823.84
1048576 3810.02 1048576 3844.70
2097152 3838.38 2097152 3856.94
4194304 3853.08 4194304 3862.78
sg001%
For long messages there is no significant difference in performance, but for
smaller messages there is quite a bit of difference. Is this what is
expected?
Similar question about regular OFED from the same section, since these are
newer GPUs, is the mvapich2-gdr library capable of taking advantage of GDR
capability even without the MLNX OFED?
Not sure if the second question is something that should be put to either
Red Hat support or MLNX support?
We are trying to determine if we have a problem in our hardware/software
configuration or of this is what is expected.
We appreciate any comments and suggestions about our observations above!
Thanks,
Raghu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180227/a1d5c8cb/attachment.html>
More information about the mvapich-discuss
mailing list