[mvapich-discuss] CUDA running issue in MVAPICH2
khaled hamidouche
hamidouc at cse.ohio-state.edu
Thu Apr 9 09:01:15 EDT 2015
Hi Dun,
the CUDA-Aware support in MVAPICH2 is only with ch3:IB. please refer to
this section for more details on how to configure and run.
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-120004.5
Further, to have the GDR (GPUDirect RDMA) please use the MVAPICH2-GDR
package available here: http://mvapich.cse.ohio-state.edu/downloads/
Please let us know if you face any issue.
Thanks
On Thu, Apr 9, 2015 at 8:33 AM, Dun Liang <randonlang at gmail.com> wrote:
> Dear developers:
>
> currently I have some problems running mvapich with cuda,
> the program is osu_latency
> here is the error msg:
> ```
> ┌─[liangdun at debian81] -
> [~/mvapich/mvapich2-2.1rc2_ib/mvapich2-2.1rc2/osu_benchmarks/.libs] -
> [2015-04-09 06:17:20]
> └─[1] <> mpirun_rsh -np 2 debian81 debian81 ./osu_latency D D
> # OSU MPI-CUDA Latency Test
> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
> # Size Latency (us)
> [debian81:mpi_rank_0][error_sighandler] Caught error: Segmentation fault
> (signal 11)
> [debian81:mpispawn_0][readline] Unexpected End-Of-File on file descriptor
> 6. MPI process died?
> [debian81:mpispawn_0][mtpmi_processops] Error while reading PMI socket.
> MPI process died?
> [debian81:mpispawn_0][child_handler] MPI process (rank: 0, pid: 1376)
> terminated with signal 11 -> abort job
> [debian81:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node
> debian81 aborted: Error while reading a PMI socket (4)
>
> ```
> it works fine when I run `./osu_latency H H`
> ```
> ┌─[liangdun at debian81] -
> [~/mvapich/mvapich2-2.1rc2_ib/mvapich2-2.1rc2/osu_benchmarks/.libs] -
> [2015-04-09 06:17:41]
> └─[1] <> mpirun_rsh -np 2 debian81 debian81 ./osu_latency H H
> # OSU MPI-CUDA Latency Test
> # Send Buffer on HOST (H) and Receive Buffer on HOST (H)
> # Size Latency (us)
> 1 0.28
> 2 0.27
> 4 0.27
> 8 0.29
> 16 0.27
> 32 0.28
> 64 0.31
> 128 0.33
> 256 0.39
> 512 0.46
> 1024 0.56
> 2048 0.75
> 4096 1.24
> 8192 1.99
> 16384 3.71
> 32768 6.49
> 65536 6.96
> 131072 12.95
> 262144 27.73
> 524288 56.53
> 1048576 113.61
> 2097152 226.53
> 4194304 628.29
>
> ```
>
> here is my mpi version info:
> ```
> MVAPICH2 Version: 2.1rc2
> MVAPICH2 Release date: Thu Mar 12 20:00:00 EDT 2014
> MVAPICH2 Device: ch3:mrail
> MVAPICH2 configure: --prefix=/home/liangdun/mvapich/build
> --enable-cuda --disable-mcast --with-cuda=/usr/local/cuda
> --with-device=ch3:mrail
> MVAPICH2 CC: gcc -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 CXX: g++ -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 F77: gfortran -L/lib -L/lib -O2
> MVAPICH2 FC: gfortran -O2
> ```
> the special circumstance is there is no infiniband installed in my
> computer, but I have to test cuda, I find out --enable-cuda config doesnt
> work when I using --with-device=ch3:sock .
>
> here are my questions:
> * is this cuda error caused by no infiniband installation?
> * is there any way to test cuda with tcp/ip setup?
>
> sorry for my poor English, I appreciate MVAPICH's work!
>
> best regards!
>
> Dun
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150409/704b7b54/attachment-0001.html>
More information about the mvapich-discuss
mailing list