[mvapich-discuss] OSU MPI-CUDA Benchmarks

Sat Jul 28 07:38:03 EDT 2012

Hi All,

I was running the MPI-CUDA OSU Benchmarks and noticed that I was getting better bandwidth with internode H-D, D-H, messages than I was with intranode. I was also getting very slow bandwidth with the intranode Device to Host messaging. Does anybody know why this would be or have any references to what is going on during GPU messaging? I've shown a few of the results below.

Thank you,

Brody 

I also put all results from the benchmark tests here if anyone is interested. 
http://www-nlp.stanford.edu/~brodyh/doku.php?id=gpu:week_7_23_12

Intranode
# OSU MPI-CUDA Bandwidth Test v3.6
# Send Buffer on HOST (H) and Receive Buffer on DEVICE (D)
# Size      Bandwidth (MB/s)
1                       0.01
2                       0.01
4                       0.02
8                       0.05
16                      0.10
32                      0.19
64                      0.38
128                     0.76
256                     1.50
512                     3.06
1024                    6.07
2048                   12.09
4096                   23.76
8192                   44.12
16384                 682.71
32768                1084.14
65536                1536.38
131072                583.10
262144               1011.24
524288               1622.84
1048576              1575.20
2097152              1473.78
4194304              1549.10

# OSU MPI-CUDA Bandwidth Test v3.6
# Send Buffer on DEVICE (D) and Receive Buffer on HOST (H)
# Size      Bandwidth (MB/s)
1                       0.00
2                       0.00
4                       0.01
8                       0.01
16                      0.03
32                      0.05
64                      0.11
128                     0.22
256                     0.44
512                     0.87
1024                    1.73
2048                    3.43
4096                    6.69
8192                   12.98
16384                  24.10
32768                  45.00
65536                  79.13
131072                126.97
262144                187.86
524288                225.30
1048576               229.49
2097152               231.12
4194304               217.52

Internode
# OSU MPI-CUDA Bandwidth Test v3.6
# Send Buffer on HOST (H) and Receive Buffer on DEVICE (D)
# Size      Bandwidth (MB/s)
1                       0.01
2                       0.01
4                       0.02
8                       0.05
16                      0.10
32                      0.20
64                      0.40
128                     0.78
256                     1.58
512                     3.14
1024                    6.30
2048                   12.56
4096                   24.50
8192                   48.77
16384                  89.95
32768                 174.05
65536                 326.03
131072                634.41
262144               1605.97
524288               2666.19
1048576              2686.49
2097152              2688.81
4194304              2683.35

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120728/bcaca7fd/attachment-0001.html