[mvapich-discuss] Segfault when running osu_put_latency with MVAPICH-GDR 2.3.2

Si, Min msi at anl.gov
Tue Sep 3 17:29:54 EDT 2019


Dear MVAPICH developers,

I am using MVAPICH-GDR 2.3.2 with osu_put_latency in osu microbenchmark version 5.6.2. However, it segfaults at the MPI_Put call when using both origin buffer and window buffer in GPU.

I ran the benchmark using the following commands:
$ export MV2_USE_CUDA=1
$ export MV2_USE_GDRCOPY=0
$ export MV2_DEBUG_SHOW_BACKTRACE=1
$ <mvapich2_gdr_install_dir>/mpiexec -np 2 -ppn 2 -w create -s flush -m 4194304 -i 10000 -d cuda D D

Machine config: 4x NVIDIA Tesla V100 SXM2

Output:
# OSU MPI_Put-CUDA Latency Test v5.6.2
# Window creation: MPI_Win_create
# Synchronization: MPI_Win_flush
# Rank 0 Memory on DEVICE (D) and Rank 1 Memory on DEVICE (D)
# Size          Latency (us)
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][print_backtrace]   0: /home/minsi/local/mvapich2-gdr/2.3.2/mcast/no-openacc/cuda10.0/mofed4.4/mpirun/gcc4.8.5/lib64/libmpi.so.12(print_backtrace+0x1c) [0x7fbe5dfc860c]
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][print_backtrace]   1: /home/minsi/local/mvapich2-gdr/2.3.2/mcast/no-openacc/cuda10.0/mofed4.4/mpirun/gcc4.8.5/lib64/libmpi.so.12(error_sighandler+0x59) [0x7fbe5dfc8709]
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][print_backtrace]   2: /lib64/libpthread.so.0(+0xf680) [0x7fbe5e61b680]
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][print_backtrace]   3: /home/minsi/local/mvapich2-gdr/2.3.2/mcast/no-openacc/cuda10.0/mofed4.4/mpirun/gcc4.8.5/lib64/libmpi.so.12(MPIDI_CH3I_Put+0x1050) [0x7fbe5df26360]
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][print_backtrace]   4: /home/minsi/local/mvapich2-gdr/2.3.2/mcast/no-openacc/cuda10.0/mofed4.4/mpirun/gcc4.8.5/lib64/libmpi.so.12(MPID_Put+0x23) [0x7fbe5df2d4e3]
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][print_backtrace]   5: /home/minsi/local/mvapich2-gdr/2.3.2/mcast/no-openacc/cuda10.0/mofed4.4/mpirun/gcc4.8.5/lib64/libmpi.so.12(MPI_Put+0x769) [0x7fbe5debd619]
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][print_backtrace]   6: ./mpi/mva2_gdr/libexec/osu-micro-benchmarks/mpi/one-sided/osu_put_latency() [0x402c61]
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][print_backtrace]   7: ./mpi/mva2_gdr/libexec/osu-micro-benchmarks/mpi/one-sided/osu_put_latency() [0x402601]
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][print_backtrace]   8: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fbe5d17b3d5]
[gpu02.ftm.alcf.anl.gov:mpi_rank_0][print_backtrace]   9: ./mpi/mva2_gdr/libexec/osu-micro-benchmarks/mpi/one-sided/osu_put_latency() [0x40201b]

I think CUDA-aware MPI RMA is supported in MVAPICH2-GDR as listed at http://mvapich.cse.ohio-state.edu/userguide/gdr/#_cuda_aware_mpi_primitives
I am not sure if I miss any setting. Could you please help me with this issue ?

Please let me know if you need any information. Thanks.

Best regards,
Min
On 2019/09/03 10:03, Smith, Jeff wrote:
Hi Min,

I have generated your requested rpm. You will find it on our download page here http://mvapich.cse.ohio-state.edu/downloads/ under User Requested RPMs in MOFED4.5 .

Let me know if there is anything else I can help with.

Thanks,
Jeff
________________________________
From: Si, Min <msi at anl.gov><mailto:msi at anl.gov>
Sent: Wednesday, August 28, 2019 4:56 PM
To: _ENG CSE Mvapich-Help <ENG-cse-mvapich-help at osu.edu><mailto:ENG-cse-mvapich-help at osu.edu>
Subject: Need MVAPICH2-GDR with MLNX-OFED 4.5 and CUDA-9.1

Hi,

I want to use MVAPICH2-GDR on the ALCF Cooley cluster:
https://www.alcf.anl.gov/user-guides/cooley

The latest OFED and CUDA on the system are MLNX-OFED 4.5 and CUDA-9.1,
with GNU 4.8.5. However, I could not find the corresponding binary on
your website. Could you please generate it for me ? Thanks.

Best regards,
Min

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190903/fb72f008/attachment-0001.html>


More information about the mvapich-discuss mailing list