[mvapich-discuss] intra node communication between GPUs
Jeff Hammond
jhammond at alcf.anl.gov
Mon Jun 10 17:30:13 EDT 2013
Hi Ye,
You need to look at your data more carefully. The bandwidth for "two
GPUs on a same PCIe bus" (I call this "A") is _higher_ than "two GPUs
on different nodes" (I call this "B") for message sizes less than or
equal to 131072. For sizes 262144 and higher, B provides <10% higher
bandwidth.
Here is the data side-by-side:
# A: same PCIe bus
# B: two GPUs on two nodes:
# Size A MB/s B MB/s
1 0.13 0.09
2 0.25 0.17
4 0.51 0.34
8 1.01 0.69
16 2.03 1.35
32 4.04 2.68
64 8.08 5.46
128 16.25 10.88
256 34.22 21.89
512 67.96 43.00
1024 137.11 83.51
2048 272.85 160.64
4096 540.70 297.13
8192 1092.38 524.53
16384 2125.43 1547.57
32768 3134.74 2525.94
65536 4022.73 3607.36
131072 4836.31 4556.56
262144 4944.00 5174.55
524288 5009.88 5318.21
1048576 5019.20 5346.60
2097152 5052.10 5373.96
4194304 5067.23 5405.89
When you communicate within the node, you have the advantage of shared
resources and thus lower latency, but you are sharing resources. On
the other hand, two nodes have two CPUs and two NICs to move data in
concert; however, the latency is higher because there are more hops.
I suspect the two-node cases wins in the large-message limit due to
pipelining and other appropriate optimizations that allow for good
utilization of both the sender and receive PCIe link at the same time.
Best,
Jeff
On Mon, Jun 10, 2013 at 8:30 AM, Ye Wang <wang1351 at purdue.edu> wrote:
> Hi,
>
> I am using mvapich2-1.9 to do communications between GPUs. When I tried to test the bandwidth using the osu_bw in the osu_benchmarks package between GPUs on a GPU cluster, I found that the bandwidth between two GPUs on a same PCIe bus is smaller than the bandwidth between two GPUs on different nodes. I can not figure out the reason. I think the communication between two GPUs on a same PCIe bus should be through PCIe bus directly with the support of GPUDirect v2. Why is it slower than communication between two GPUs on separate nodes?
>
> The following is the bandwidth between two GPUs on same PCIe bus:
>
> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
> # Size Bandwidth (MB/s)
> 1 0.13
> 2 0.25
> 4 0.51
> 8 1.01
> 16 2.03
> 32 4.04
> 64 8.08
> 128 16.25
> 256 34.22
> 512 67.96
> 1024 137.11
> 2048 272.85
> 4096 540.70
> 8192 1092.38
> 16384 2125.43
> 32768 3134.74
> 65536 4022.73
> 131072 4836.31
> 262144 4944.00
> 524288 5009.88
> 1048576 5019.20
> 2097152 5052.10
> 4194304 5067.23
>
> And this is the bandwidth between GPUs on two nodes:
>
> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
> # Size Bandwidth (MB/s)
> 1 0.09
> 2 0.17
> 4 0.34
> 8 0.69
> 16 1.35
> 32 2.68
> 64 5.46
> 128 10.88
> 256 21.89
> 512 43.00
> 1024 83.51
> 2048 160.64
> 4096 297.13
> 8192 524.53
> 16384 1547.57
> 32768 2525.94
> 65536 3607.36
> 131072 4556.56
> 262144 5174.55
> 524288 5318.21
> 1048576 5346.60
> 2097152 5373.96
> 4194304 5405.89
>
> Thanks,
>
> Ye
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
--
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
ALCF docs: http://www.alcf.anl.gov/user-guides
More information about the mvapich-discuss
mailing list