[mvapich-discuss] CUDA running issue in MVAPICH2

Jonathan Perkins perkinjo at cse.ohio-state.edu
Thu Apr 9 12:28:54 EDT 2015


Hi Dun.  Your results look "okay" to me.  The latency between transfers
originating or landing on a GPU have much higher latency than those on the
standard CPU memory.

We are able to achieve slightly lower latency in house but this may be due
to our hardware and build settings compared to yours.  Can you share the
output of mpiname -a as well as the output from an osu_latency run
with MV2_SHOW_ENV_INFO=1 also set?

On Thu, Apr 9, 2015 at 12:12 PM randonlang at gmail.com <randonlang at gmail.com>
wrote:

> Thx,  Jonathan, it works! and thanks khaled too.
> sorry for bother again :p
> but I got some weird output, D to D is far more slower than H to H when
> transfer small data, even D to H
>
> here is the benchmark result:
>
> # OSU MPI-CUDA Latency Test
> # Send Buffer on *DEVICE (D)* and Receive Buffer on *DEVICE (D) *
> # Size Latency (us)
> 1 63.42
> 2 63.02
> 4 61.95
> 8 61.96
> 16 61.87
> 32 61.95
> 64 61.92
> 128 61.94
> 256 61.97
> 512 61.98
> 1024 62.06
> 2048 62.05
> 4096 62.12
> 8192 62.15
> 16384 74.19
> 32768 74.25
> 65536 75.24
> 131072 82.66
> 262144 81.32
> 524288 85.70
> 1048576 121.99
> 2097152 272.36
> 4194304 585.34
>
> # OSU MPI-CUDA Latency Test
> # Send Buffer on *HOST (H)* and Receive Buffer on *HOST (H) *
> # Size Latency (us)
> 1 0.92
> 2 0.91
> 4 0.91
> 8 0.92
> 16 0.91
> 32 0.93
> 64 0.99
> 128 0.96
> 256 1.03
> 512 1.11
> 1024 1.20
> 2048 1.39
> 4096 1.78
> 8192 2.74
> 16384 5.31
> 32768 7.32
> 65536 8.00
> 131072 13.95
> 262144 29.38
> 524288 57.95
> 1048576 115.65
> 2097152 226.63
> 4194304 571.31
>
>
>
> # OSU MPI-CUDA Latency Test
>
> # Send Buffer on *HOST (H)* and Receive Buffer on *DEVICE (D) *
> # Size Latency (us)
> 1 9.59
> 2 9.73
> 4 9.56
> 8 9.66
> 16 9.83
> 32 9.63
> 64 9.75
> 128 8.57
> 256 8.42
> 512 8.87
> 1024 8.62
> 2048 8.79
> 4096 9.34
> 8192 10.37
> 16384 12.40
> 32768 19.03
> 65536 21.84
> 131072 35.24
> 262144 66.08
> 524288 110.40
> 1048576 207.23
> 2097152 354.09
> 4194304 669.29
>
>
> *From:* Jonathan Perkins <perkinjo at cse.ohio-state.edu>
> *Date:* 2015-04-09 21:40
> *To:* Dun Liang <randonlang at gmail.com>; mvapich-discuss
> <mvapich-discuss at cse.ohio-state.edu>
> *Subject:* Re: [mvapich-discuss] CUDA running issue in MVAPICH2
>
> Hi Dun, can you try setting MV2_USE_CUDA=1 when you run the benchmarks
> with the device buffers?
>
> Example:
> mpirun_rsh -np 2 debian81 debian81 MV2_USE_CUDA=1 ./osu_latency D D
>
> On Thu, Apr 9, 2015 at 8:54 AM Dun Liang <randonlang at gmail.com> wrote:
>
>> Dear developers:
>>
>> currently I have some problems running mvapich with cuda,
>> the program is osu_latency
>> here is the error msg:
>> ```
>> ┌─[liangdun at debian81] -
>> [~/mvapich/mvapich2-2.1rc2_ib/mvapich2-2.1rc2/osu_benchmarks/.libs] -
>> [2015-04-09 06:17:20]
>> └─[1] <> mpirun_rsh -np 2 debian81 debian81 ./osu_latency D D
>> # OSU MPI-CUDA Latency Test
>> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
>> # Size            Latency (us)
>> [debian81:mpi_rank_0][error_sighandler] Caught error: Segmentation fault
>> (signal 11)
>> [debian81:mpispawn_0][readline] Unexpected End-Of-File on file descriptor
>> 6. MPI process died?
>> [debian81:mpispawn_0][mtpmi_processops] Error while reading PMI socket.
>> MPI process died?
>> [debian81:mpispawn_0][child_handler] MPI process (rank: 0, pid: 1376)
>> terminated with signal 11 -> abort job
>> [debian81:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node
>> debian81 aborted: Error while reading a PMI socket (4)
>>
>> ```
>> it works fine when I run `./osu_latency H H`
>> ```
>> ┌─[liangdun at debian81] -
>> [~/mvapich/mvapich2-2.1rc2_ib/mvapich2-2.1rc2/osu_benchmarks/.libs] -
>> [2015-04-09 06:17:41]
>> └─[1] <> mpirun_rsh -np 2 debian81 debian81 ./osu_latency H H
>> # OSU MPI-CUDA Latency Test
>> # Send Buffer on HOST (H) and Receive Buffer on HOST (H)
>> # Size            Latency (us)
>> 1                         0.28
>> 2                         0.27
>> 4                         0.27
>> 8                         0.29
>> 16                        0.27
>> 32                        0.28
>> 64                        0.31
>> 128                       0.33
>> 256                       0.39
>> 512                       0.46
>> 1024                      0.56
>> 2048                      0.75
>> 4096                      1.24
>> 8192                      1.99
>> 16384                     3.71
>> 32768                     6.49
>> 65536                     6.96
>> 131072                   12.95
>> 262144                   27.73
>> 524288                   56.53
>> 1048576                 113.61
>> 2097152                 226.53
>> 4194304                 628.29
>>
>> ```
>>
>> here is my mpi version info:
>> ```
>> MVAPICH2 Version:       2.1rc2
>> MVAPICH2 Release date:  Thu Mar 12 20:00:00 EDT 2014
>> MVAPICH2 Device:        ch3:mrail
>> MVAPICH2 configure:     --prefix=/home/liangdun/mvapich/build
>> --enable-cuda --disable-mcast --with-cuda=/usr/local/cuda
>> --with-device=ch3:mrail
>> MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
>> MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
>> MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
>> MVAPICH2 FC:    gfortran   -O2
>> ```
>> the special circumstance is there is no infiniband installed in my
>> computer, but I have to test cuda, I find out --enable-cuda config doesnt
>> work when I using --with-device=ch3:sock .
>>
>> here are my questions:
>> * is this cuda error caused by no infiniband installation?
>> * is there any way to test cuda with tcp/ip setup?
>>
>> sorry for my poor English, I appreciate MVAPICH's work!
>>
>> best regards!
>>
>> Dun
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150409/662049e6/attachment-0001.html>


More information about the mvapich-discuss mailing list