[mvapich-discuss] CUDA running issue in MVAPICH2

Dun Liang randonlang at gmail.com
Thu Apr 9 08:33:09 EDT 2015


Dear developers:

currently I have some problems running mvapich with cuda,
the program is osu_latency
here is the error msg:
```
┌─[liangdun at debian81] -
[~/mvapich/mvapich2-2.1rc2_ib/mvapich2-2.1rc2/osu_benchmarks/.libs] -
[2015-04-09 06:17:20]
└─[1] <> mpirun_rsh -np 2 debian81 debian81 ./osu_latency D D
# OSU MPI-CUDA Latency Test
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size            Latency (us)
[debian81:mpi_rank_0][error_sighandler] Caught error: Segmentation fault
(signal 11)
[debian81:mpispawn_0][readline] Unexpected End-Of-File on file descriptor
6. MPI process died?
[debian81:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI
process died?
[debian81:mpispawn_0][child_handler] MPI process (rank: 0, pid: 1376)
terminated with signal 11 -> abort job
[debian81:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node
debian81 aborted: Error while reading a PMI socket (4)

```
it works fine when I run `./osu_latency H H`
```
┌─[liangdun at debian81] -
[~/mvapich/mvapich2-2.1rc2_ib/mvapich2-2.1rc2/osu_benchmarks/.libs] -
[2015-04-09 06:17:41]
└─[1] <> mpirun_rsh -np 2 debian81 debian81 ./osu_latency H H
# OSU MPI-CUDA Latency Test
# Send Buffer on HOST (H) and Receive Buffer on HOST (H)
# Size            Latency (us)
1                         0.28
2                         0.27
4                         0.27
8                         0.29
16                        0.27
32                        0.28
64                        0.31
128                       0.33
256                       0.39
512                       0.46
1024                      0.56
2048                      0.75
4096                      1.24
8192                      1.99
16384                     3.71
32768                     6.49
65536                     6.96
131072                   12.95
262144                   27.73
524288                   56.53
1048576                 113.61
2097152                 226.53
4194304                 628.29

```

here is my mpi version info:
```
MVAPICH2 Version:       2.1rc2
MVAPICH2 Release date:  Thu Mar 12 20:00:00 EDT 2014
MVAPICH2 Device:        ch3:mrail
MVAPICH2 configure:     --prefix=/home/liangdun/mvapich/build --enable-cuda
--disable-mcast --with-cuda=/usr/local/cuda --with-device=ch3:mrail
MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
MVAPICH2 FC:    gfortran   -O2
```
the special circumstance is there is no infiniband installed in my
computer, but I have to test cuda, I find out --enable-cuda config doesnt
work when I using --with-device=ch3:sock .

here are my questions:
* is this cuda error caused by no infiniband installation?
* is there any way to test cuda with tcp/ip setup?

sorry for my poor English, I appreciate MVAPICH's work!

best regards!

Dun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150409/a7d0430a/attachment.html>


More information about the mvapich-discuss mailing list