[mvapich-discuss] osu-micro-benchmark with gdr failed

Subramoni, Hari subramoni.1 at osu.edu
Wed May 22 23:01:40 EDT 2019


Hi, Fan.

Good to know that things are working for you now.

This probably looks like a tuning issue with setting the eager threshold. Can you please provide more details about your system (CPU, GPU, OFED version, CUDA version, HCA, Kernel version)? This will enable us to give a better answer.

Thx,
Hari.

From: Fan Liang <chcdlf at gmail.com>
Sent: Thursday, May 23, 2019 8:24 AM
To: Subramoni, Hari <subramoni.1 at osu.edu>
Cc: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: Re: [mvapich-discuss] osu-micro-benchmark with gdr failed

thx, Hari. It works after setting LD_PRELOAD.
But I found some weird results:

1. for get_bandwidth case: the latency of D2H (Node1-GPU to Node2-Host) drops heavily when message size is 16K
2. for get_bandwidth case: the bandwidth of H2D is much lower than other cases
3. for get_latency case: the latency of H2D is much larger other cases
4. for put_bandwidth case: the bandwidth of D2D drops heavily when message size is 16K

Any parameters should be explicitly set when doing those test?

Results:

Get Bandwidth Test

[image.png]

Get Latency Test

[image.png]

Put Bandwidth Test

[image.png]





Subramoni, Hari <subramoni.1 at osu.edu<mailto:subramoni.1 at osu.edu>> 于2019年5月22日周三 下午9:56写道:
Hi, Fan.

Sorry to hear that you are facing issues.

Could you please let us know if the LD_PRELOAD has been set to point to the path to libmpi.so?


E.g. LD_PRELOAD=<PATH_TO_MVAPICH2_GDR_INSTALL>/lib64/libmpi.so

Thx,
Hari.

From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu<mailto:mvapich-discuss-bounces at cse.ohio-state.edu>> On Behalf Of Fan Liang
Sent: Wednesday, May 22, 2019 9:52 AM
To: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu> <mvapich-discuss at mailman.cse.ohio-state.edu<mailto:mvapich-discuss at mailman.cse.ohio-state.edu>>
Subject: [mvapich-discuss] osu-micro-benchmark with gdr failed


Hi, I have a problem when running the mpi/pt2pt/osu_latency D D after compiling the osu-micro-benchmark with mvapich-gdr. The log is as follows:

» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \

        ./get_local_rank ./mpi/pt2pt/osu_latency D D

# OSU MPI-CUDA Latency Test v5.6.1

# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)

# Size          Latency (us)

0                       0.32

[dell-gpu141:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)

[dell-gpu141:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 5. MPI process died?

[dell-gpu141:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?

[dell-gpu141:mpispawn_0][child_handler] MPI process (rank: 0, pid: 45765) terminated with signal 11 -> abort job

[dell-gpu141:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node 10.2.5.141 aborted: Error while reading a PMI socket (4)

I have compiled the benchmark as the following command:

./configure --enable-cuda \

    --with-cuda=/usr/local/cuda \

    CC=/home/user/local/mpi/mvapich2-gdr/bin/mpicc \

    CXX=/home/user/local/mpi/mvapich2-gdr/bin/mpicxx

I have installed mvapich2-gdr by mvapich2-gdr-mcast.cuda9.2.mofed4.5.gnu4.8.5-2.3.1-1.el7.x86_64.rpm.

It works fine when using the benchmarks in the installed package.

» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \

        ./get_local_rank $MV2_PATH/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency D D

# OSU MPI-CUDA Latency Test v5.6.1

# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)

# Size          Latency (us)

0                       0.31

1                       1.52

2                       2.58

...

It works fine when using the self-compiled benchmarks for host tests.

» $MV2_PATH/bin/mpirun_rsh -export -np 2 10.2.5.141 10.2.5.141 \

        ./get_local_rank ./mpi/pt2pt/osu_latency

# OSU MPI Latency Test v5.6.1

# Size          Latency (us)

0                       0.33

1                       0.34

2                       0.34

...

Only have problems when using the self-compiled benchmarks for device tests.


--

Fan


--
梁帆
Fan Liang
中国科学院计算技术研究所,100190
Institute Of Computing Technology Chinese Academy Of Sciences, 100190
E-mail: chcdlf at gmail.com<mailto:chcdlf at gmail.com> liangfan at ict.ac.cn<mailto:liangfan at ict.ac.cn>
Tel: 13141474339
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190523/8f44ab13/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 61172 bytes
Desc: image001.png
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190523/8f44ab13/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 43792 bytes
Desc: image002.png
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190523/8f44ab13/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 55672 bytes
Desc: image003.png
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190523/8f44ab13/attachment-0005.png>


More information about the mvapich-discuss mailing list