[mvapich-discuss] intra node communication between GPUs

Mon Jun 10 17:30:13 EDT 2013

Hi Ye,

You need to look at your data more carefully.  The bandwidth for "two
GPUs on a same PCIe bus" (I call this "A") is _higher_ than "two GPUs
on different nodes" (I call this "B") for message sizes less than or
equal to 131072.  For sizes 262144 and higher, B provides <10% higher
bandwidth.

Here is the data side-by-side:

# A: same PCIe bus
# B: two GPUs on two nodes:
# Size   A MB/s  B MB/s
1          0.13    0.09
2          0.25    0.17
4          0.51    0.34
8          1.01    0.69
16         2.03    1.35
32         4.04    2.68
64         8.08    5.46
128       16.25   10.88
256       34.22   21.89
512       67.96   43.00
1024     137.11   83.51
2048     272.85  160.64
4096     540.70  297.13
8192    1092.38  524.53
16384   2125.43 1547.57
32768   3134.74 2525.94
65536   4022.73 3607.36
131072  4836.31 4556.56
262144  4944.00 5174.55
524288  5009.88 5318.21
1048576 5019.20 5346.60
2097152 5052.10 5373.96
4194304 5067.23 5405.89

When you communicate within the node, you have the advantage of shared
resources and thus lower latency, but you are sharing resources.  On
the other hand, two nodes have two CPUs and two NICs to move data in
concert; however, the latency is higher because there are more hops.
I suspect the two-node cases wins in the large-message limit due to
pipelining and other appropriate optimizations that allow for good
utilization of both the sender and receive PCIe link at the same time.

Best,

Jeff

On Mon, Jun 10, 2013 at 8:30 AM, Ye Wang <wang1351 at purdue.edu> wrote:
> Hi,
>
> I am using mvapich2-1.9 to do communications between GPUs. When I tried to test the bandwidth using the osu_bw in the osu_benchmarks package between GPUs on a GPU cluster, I found that the bandwidth between two GPUs on a same PCIe bus is smaller than the bandwidth between two GPUs on different nodes. I can not figure out the reason. I think the communication between two GPUs on a same PCIe bus should be through PCIe bus directly with the support of GPUDirect v2. Why is it slower than communication between two GPUs on separate nodes?
>
> The following is the bandwidth between two GPUs on same PCIe bus:
>
> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
> # Size        Bandwidth (MB/s)
> 1                         0.13
> 2                         0.25
> 4                         0.51
> 8                         1.01
> 16                        2.03
> 32                        4.04
> 64                        8.08
> 128                      16.25
> 256                      34.22
> 512                      67.96
> 1024                    137.11
> 2048                    272.85
> 4096                    540.70
> 8192                   1092.38
> 16384                  2125.43
> 32768                  3134.74
> 65536                  4022.73
> 131072                 4836.31
> 262144                 4944.00
> 524288                 5009.88
> 1048576                5019.20
> 2097152                5052.10
> 4194304                5067.23
>
> And this is the bandwidth between GPUs on two nodes:
>
> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
> # Size        Bandwidth (MB/s)
> 1                         0.09
> 2                         0.17
> 4                         0.34
> 8                         0.69
> 16                        1.35
> 32                        2.68
> 64                        5.46
> 128                      10.88
> 256                      21.89
> 512                      43.00
> 1024                     83.51
> 2048                    160.64
> 4096                    297.13
> 8192                    524.53
> 16384                  1547.57
> 32768                  2525.94
> 65536                  3607.36
> 131072                 4556.56
> 262144                 5174.55
> 524288                 5318.21
> 1048576                5346.60
> 2097152                5373.96
> 4194304                5405.89
>
> Thanks,
>
> Ye
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
ALCF docs: http://www.alcf.anl.gov/user-guides