[mvapich-discuss] CUDA running issue in MVAPICH2

khaled hamidouche hamidouc at cse.ohio-state.edu
Thu Apr 9 09:01:15 EDT 2015


Hi Dun,

the CUDA-Aware support in MVAPICH2 is only with ch3:IB. please refer to
this section for more details on how to configure and run.
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-120004.5


Further, to have the GDR (GPUDirect RDMA) please use the MVAPICH2-GDR
package available here: http://mvapich.cse.ohio-state.edu/downloads/

Please let us know if you face any issue.

Thanks


On Thu, Apr 9, 2015 at 8:33 AM, Dun Liang <randonlang at gmail.com> wrote:

> Dear developers:
>
> currently I have some problems running mvapich with cuda,
> the program is osu_latency
> here is the error msg:
> ```
> ┌─[liangdun at debian81] -
> [~/mvapich/mvapich2-2.1rc2_ib/mvapich2-2.1rc2/osu_benchmarks/.libs] -
> [2015-04-09 06:17:20]
> └─[1] <> mpirun_rsh -np 2 debian81 debian81 ./osu_latency D D
> # OSU MPI-CUDA Latency Test
> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
> # Size            Latency (us)
> [debian81:mpi_rank_0][error_sighandler] Caught error: Segmentation fault
> (signal 11)
> [debian81:mpispawn_0][readline] Unexpected End-Of-File on file descriptor
> 6. MPI process died?
> [debian81:mpispawn_0][mtpmi_processops] Error while reading PMI socket.
> MPI process died?
> [debian81:mpispawn_0][child_handler] MPI process (rank: 0, pid: 1376)
> terminated with signal 11 -> abort job
> [debian81:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node
> debian81 aborted: Error while reading a PMI socket (4)
>
> ```
> it works fine when I run `./osu_latency H H`
> ```
> ┌─[liangdun at debian81] -
> [~/mvapich/mvapich2-2.1rc2_ib/mvapich2-2.1rc2/osu_benchmarks/.libs] -
> [2015-04-09 06:17:41]
> └─[1] <> mpirun_rsh -np 2 debian81 debian81 ./osu_latency H H
> # OSU MPI-CUDA Latency Test
> # Send Buffer on HOST (H) and Receive Buffer on HOST (H)
> # Size            Latency (us)
> 1                         0.28
> 2                         0.27
> 4                         0.27
> 8                         0.29
> 16                        0.27
> 32                        0.28
> 64                        0.31
> 128                       0.33
> 256                       0.39
> 512                       0.46
> 1024                      0.56
> 2048                      0.75
> 4096                      1.24
> 8192                      1.99
> 16384                     3.71
> 32768                     6.49
> 65536                     6.96
> 131072                   12.95
> 262144                   27.73
> 524288                   56.53
> 1048576                 113.61
> 2097152                 226.53
> 4194304                 628.29
>
> ```
>
> here is my mpi version info:
> ```
> MVAPICH2 Version:       2.1rc2
> MVAPICH2 Release date:  Thu Mar 12 20:00:00 EDT 2014
> MVAPICH2 Device:        ch3:mrail
> MVAPICH2 configure:     --prefix=/home/liangdun/mvapich/build
> --enable-cuda --disable-mcast --with-cuda=/usr/local/cuda
> --with-device=ch3:mrail
> MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
> MVAPICH2 FC:    gfortran   -O2
> ```
> the special circumstance is there is no infiniband installed in my
> computer, but I have to test cuda, I find out --enable-cuda config doesnt
> work when I using --with-device=ch3:sock .
>
> here are my questions:
> * is this cuda error caused by no infiniband installation?
> * is there any way to test cuda with tcp/ip setup?
>
> sorry for my poor English, I appreciate MVAPICH's work!
>
> best regards!
>
> Dun
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150409/704b7b54/attachment-0001.html>


More information about the mvapich-discuss mailing list