[mvapich-discuss] osu-micro-benchmark with gdr failed

Fan Liang chcdlf at gmail.com
Wed May 22 00:21:32 EDT 2019


Hi, I have a problem when running the mpi/pt2pt/osu_latency D D after
compiling the osu-micro-benchmark with mvapich-gdr. The log is as follows:

» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \
        ./get_local_rank ./mpi/pt2pt/osu_latency D D# OSU MPI-CUDA
Latency Test v5.6.1# Send Buffer on DEVICE (D) and Receive Buffer on
DEVICE (D)# Size          Latency (us)
0                       0.32
[dell-gpu141:mpi_rank_0][error_sighandler] Caught error: Segmentation
fault (signal 11)
[dell-gpu141:mpispawn_0][readline] Unexpected End-Of-File on file
descriptor 5. MPI process died?
[dell-gpu141:mpispawn_0][mtpmi_processops] Error while reading PMI
socket. MPI process died?
[dell-gpu141:mpispawn_0][child_handler] MPI process (rank: 0, pid:
45765) terminated with signal 11 -> abort job
[dell-gpu141:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from
node 10.2.5.141 aborted: Error while reading a PMI socket (4)

I have compiled the benchmark as the following command:

./configure --enable-cuda \
    --with-cuda=/usr/local/cuda \
    CC=/home/user/local/mpi/mvapich2-gdr/bin/mpicc \
    CXX=/home/user/local/mpi/mvapich2-gdr/bin/mpicxx

I have installed mvapich2-gdr by
mvapich2-gdr-mcast.cuda9.2.mofed4.5.gnu4.8.5-2.3.1-1.el7.x86_64.rpm.

It works fine when using the benchmarks in the installed package.

» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \
        ./get_local_rank
$MV2_PATH/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency D D# OSU
MPI-CUDA Latency Test v5.6.1# Send Buffer on DEVICE (D) and Receive
Buffer on DEVICE (D)# Size          Latency (us)
0                       0.31
1                       1.52
2                       2.58
...

It works fine when using the self-compiled benchmarks for host tests.

» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \
        ./get_local_rank ./mpi/pt2pt/osu_latency# OSU MPI Latency Test
v5.6.1# Size          Latency (us)
0                       0.33
1                       0.34
2                       0.34
...

Only have problems when using the self-compiled benchmarks for device tests.


-- 

Fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190522/846ba2ea/attachment.html>


More information about the mvapich-discuss mailing list