[mvapich-discuss] osu-micro-benchmark with gdr failed

Subramoni, Hari subramoni.1 at osu.edu
Wed May 22 09:56:20 EDT 2019


Hi, Fan.

Sorry to hear that you are facing issues.

Could you please let us know if the LD_PRELOAD has been set to point to the path to libmpi.so?


E.g. LD_PRELOAD=<PATH_TO_MVAPICH2_GDR_INSTALL>/lib64/libmpi.so

Thx,
Hari.


From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> On Behalf Of Fan Liang
Sent: Wednesday, May 22, 2019 9:52 AM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] osu-micro-benchmark with gdr failed


Hi, I have a problem when running the mpi/pt2pt/osu_latency D D after compiling the osu-micro-benchmark with mvapich-gdr. The log is as follows:

» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \

        ./get_local_rank ./mpi/pt2pt/osu_latency D D

# OSU MPI-CUDA Latency Test v5.6.1

# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)

# Size          Latency (us)

0                       0.32

[dell-gpu141:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)

[dell-gpu141:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?

[dell-gpu141:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?

[dell-gpu141:mpispawn_0][child_handler] MPI process (rank: 0, pid: 45765) terminated with signal 11 -> abort job

[dell-gpu141:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node 10.2.5.141 aborted: Error while reading a PMI socket (4)

I have compiled the benchmark as the following command:

./configure --enable-cuda \

    --with-cuda=/usr/local/cuda \

    CC=/home/user/local/mpi/mvapich2-gdr/bin/mpicc \

    CXX=/home/user/local/mpi/mvapich2-gdr/bin/mpicxx

I have installed mvapich2-gdr by mvapich2-gdr-mcast.cuda9.2.mofed4.5.gnu4.8.5-2.3.1-1.el7.x86_64.rpm.

It works fine when using the benchmarks in the installed package.

» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \

        ./get_local_rank $MV2_PATH/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency D D

# OSU MPI-CUDA Latency Test v5.6.1

# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)

# Size          Latency (us)

0                       0.31

1                       1.52

2                       2.58

...

It works fine when using the self-compiled benchmarks for host tests.

» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \

        ./get_local_rank ./mpi/pt2pt/osu_latency

# OSU MPI Latency Test v5.6.1

# Size          Latency (us)

0                       0.33

1                       0.34

2                       0.34

...

Only have problems when using the self-compiled benchmarks for device tests.


--

Fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190522/8b08349f/attachment.html>


More information about the mvapich-discuss mailing list