[mvapich-discuss] osu-micro-benchmark with gdr failed
Subramoni, Hari
subramoni.1 at osu.edu
Wed May 22 09:56:20 EDT 2019
Hi, Fan.
Sorry to hear that you are facing issues.
Could you please let us know if the LD_PRELOAD has been set to point to the path to libmpi.so?
E.g. LD_PRELOAD=<PATH_TO_MVAPICH2_GDR_INSTALL>/lib64/libmpi.so
Thx,
Hari.
From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> On Behalf Of Fan Liang
Sent: Wednesday, May 22, 2019 9:52 AM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] osu-micro-benchmark with gdr failed
Hi, I have a problem when running the mpi/pt2pt/osu_latency D D after compiling the osu-micro-benchmark with mvapich-gdr. The log is as follows:
» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \
./get_local_rank ./mpi/pt2pt/osu_latency D D
# OSU MPI-CUDA Latency Test v5.6.1
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size Latency (us)
0 0.32
[dell-gpu141:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[dell-gpu141:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?
[dell-gpu141:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[dell-gpu141:mpispawn_0][child_handler] MPI process (rank: 0, pid: 45765) terminated with signal 11 -> abort job
[dell-gpu141:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node 10.2.5.141 aborted: Error while reading a PMI socket (4)
I have compiled the benchmark as the following command:
./configure --enable-cuda \
--with-cuda=/usr/local/cuda \
CC=/home/user/local/mpi/mvapich2-gdr/bin/mpicc \
CXX=/home/user/local/mpi/mvapich2-gdr/bin/mpicxx
I have installed mvapich2-gdr by mvapich2-gdr-mcast.cuda9.2.mofed4.5.gnu4.8.5-2.3.1-1.el7.x86_64.rpm.
It works fine when using the benchmarks in the installed package.
» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \
./get_local_rank $MV2_PATH/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency D D
# OSU MPI-CUDA Latency Test v5.6.1
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size Latency (us)
0 0.31
1 1.52
2 2.58
...
It works fine when using the self-compiled benchmarks for host tests.
» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \
./get_local_rank ./mpi/pt2pt/osu_latency
# OSU MPI Latency Test v5.6.1
# Size Latency (us)
0 0.33
1 0.34
2 0.34
...
Only have problems when using the self-compiled benchmarks for device tests.
--
Fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190522/8b08349f/attachment.html>
More information about the mvapich-discuss
mailing list