[mvapich-discuss] OSU MPI-CUDA Benchmarks
Brody Huval
brodyh at stanford.edu
Sat Jul 28 07:38:03 EDT 2012
Hi All,
I was running the MPI-CUDA OSU Benchmarks and noticed that I was getting better bandwidth with internode H-D, D-H, messages than I was with intranode. I was also getting very slow bandwidth with the intranode Device to Host messaging. Does anybody know why this would be or have any references to what is going on during GPU messaging? I've shown a few of the results below.
Thank you,
Brody
I also put all results from the benchmark tests here if anyone is interested.
http://www-nlp.stanford.edu/~brodyh/doku.php?id=gpu:week_7_23_12
Intranode
# OSU MPI-CUDA Bandwidth Test v3.6
# Send Buffer on HOST (H) and Receive Buffer on DEVICE (D)
# Size Bandwidth (MB/s)
1 0.01
2 0.01
4 0.02
8 0.05
16 0.10
32 0.19
64 0.38
128 0.76
256 1.50
512 3.06
1024 6.07
2048 12.09
4096 23.76
8192 44.12
16384 682.71
32768 1084.14
65536 1536.38
131072 583.10
262144 1011.24
524288 1622.84
1048576 1575.20
2097152 1473.78
4194304 1549.10
# OSU MPI-CUDA Bandwidth Test v3.6
# Send Buffer on DEVICE (D) and Receive Buffer on HOST (H)
# Size Bandwidth (MB/s)
1 0.00
2 0.00
4 0.01
8 0.01
16 0.03
32 0.05
64 0.11
128 0.22
256 0.44
512 0.87
1024 1.73
2048 3.43
4096 6.69
8192 12.98
16384 24.10
32768 45.00
65536 79.13
131072 126.97
262144 187.86
524288 225.30
1048576 229.49
2097152 231.12
4194304 217.52
Internode
# OSU MPI-CUDA Bandwidth Test v3.6
# Send Buffer on HOST (H) and Receive Buffer on DEVICE (D)
# Size Bandwidth (MB/s)
1 0.01
2 0.01
4 0.02
8 0.05
16 0.10
32 0.20
64 0.40
128 0.78
256 1.58
512 3.14
1024 6.30
2048 12.56
4096 24.50
8192 48.77
16384 89.95
32768 174.05
65536 326.03
131072 634.41
262144 1605.97
524288 2666.19
1048576 2686.49
2097152 2688.81
4194304 2683.35
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120728/bcaca7fd/attachment-0001.html
More information about the mvapich-discuss
mailing list