[mvapich-discuss] OSU MPI-CUDA Benchmarks
sreeram potluri
potluri at cse.ohio-state.edu
Sat Jul 28 11:47:54 EDT 2012
Hi Brody,
Can you give us some information about the MPI library (and version) you
are using and the setup/configuration of your GPU nodes? Can you also send
us the output of "lscpi -tv" on the node?
There seems to be something wrong with the intra-node experiments (either
the node setup or the runtime configuration). The latencies are abnormally
high for DH and HD runs. They should be in the range of 20-25 usec. You are
seeing similar hit on the bandwidth.
# OSU MPI-CUDA Latency Test v3.6
# Send Buffer on DEVICE (D) and Receive Buffer on HOST (H)
# Size Latency (us)
0 0.44
1 134.15
2 134.37
4 133.89
8 133.82
16 133.75
32 133.71
Best
Sreeram Potluri
On Sat, Jul 28, 2012 at 7:38 AM, Brody Huval <brodyh at stanford.edu> wrote:
> Hi All,
>
> I was running the MPI-CUDA OSU Benchmarks and noticed that I was getting
> better bandwidth with internode H-D, D-H, messages than I was with
> intranode. I was also getting very slow bandwidth with the intranode Device
> to Host messaging. Does anybody know why this would be or have any
> references to what is going on during GPU messaging? I've shown a few of
> the results below.
>
> Thank you,
>
> Brody
>
>
> I also put all results from the benchmark tests here if anyone is
> interested.
> http://www-nlp.stanford.edu/~brodyh/doku.php?id=gpu:week_7_23_12
>
>
> *Intranode*
>
> # OSU MPI-CUDA Bandwidth Test v3.6
> # Send Buffer on HOST (H) and Receive Buffer on DEVICE (D)
> # Size Bandwidth (MB/s)
> 1 0.01
> 2 0.01
> 4 0.02
> 8 0.05
> 16 0.10
> 32 0.19
> 64 0.38
> 128 0.76
> 256 1.50
> 512 3.06
> 1024 6.07
> 2048 12.09
> 4096 23.76
> 8192 44.12
> 16384 682.71
> 32768 1084.14
> 65536 1536.38
> 131072 583.10
> 262144 1011.24
> 524288 1622.84
> 1048576 1575.20
> 2097152 1473.78
> 4194304 1549.10
>
>
> # OSU MPI-CUDA Bandwidth Test v3.6
> # Send Buffer on DEVICE (D) and Receive Buffer on HOST (H)
> # Size Bandwidth (MB/s)
> 1 0.00
> 2 0.00
> 4 0.01
> 8 0.01
> 16 0.03
> 32 0.05
> 64 0.11
> 128 0.22
> 256 0.44
> 512 0.87
> 1024 1.73
> 2048 3.43
> 4096 6.69
> 8192 12.98
> 16384 24.10
> 32768 45.00
> 65536 79.13
> 131072 126.97
> 262144 187.86
> 524288 225.30
> 1048576 229.49
> 2097152 231.12
> 4194304 217.52
>
>
>
> *Internode*
>
> # OSU MPI-CUDA Bandwidth Test v3.6
> # Send Buffer on HOST (H) and Receive Buffer on DEVICE (D)
> # Size Bandwidth (MB/s)
> 1 0.01
> 2 0.01
> 4 0.02
> 8 0.05
> 16 0.10
> 32 0.20
> 64 0.40
> 128 0.78
> 256 1.58
> 512 3.14
> 1024 6.30
> 2048 12.56
> 4096 24.50
> 8192 48.77
> 16384 89.95
> 32768 174.05
> 65536 326.03
> 131072 634.41
> 262144 1605.97
> 524288 2666.19
> 1048576 2686.49
> 2097152 2688.81
> 4194304 2683.35
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120728/58c6e2ea/attachment.html
More information about the mvapich-discuss
mailing list