[mvapich-discuss] issue with isend/recv to/from self with GPU memory

Panda, Dhabaleswar panda at cse.ohio-state.edu
Thu Jun 21 21:24:24 EDT 2018


Hi Tom,

Thanks for your report.

Looks like you are trying the basic MVAPICH2 version. Please note that advanced features and optimizations related to GPUs are available with the MVAPICH2-GDR version. This function is supported there.

May I request you to try the MVAPICH2-GDR version and let us know if you encounter any issues.

Thanks,

DK

________________________________
From: mvapich-discuss-bounces at cse.ohio-state.edu on behalf of Benson, Tom [benson31 at llnl.gov]
Sent: Thursday, June 21, 2018 4:41 PM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] issue with isend/recv to/from self with GPU memory

Hi,

I have encountered an issue with recv-ing from self when the send buffer or the recv buffer reside on a GPU. I realize this is a rather pathological degenerate case, but it is supported in the CPU-only case.

Essentially MPIDI_CH3U_RecvFromSelf() does not handle the case that the pointers refer to device memory and the pointers land in a call to memcpy (good ol’ libc’s memcpy), which, not unexpectedly, fails fairly catastrophically.

A standalone reproducer program is attached. Compile with something like (sorry about the c++11; not critical to anything in this program, I’m just lazy):

nvcc -ccbin=mpicxx -std=c++11 -g -O0 -DDO_CPU_TO_GPU cuda_mpi_isend_recv_samerank.cpp

Run with something like:

MV2_USE_CUDA=1 ./a.out

Example output with backtrace attached as output.txt.

Other details:

The only configuration flags I’m using are “--enable-cuda --with-cuda=/path/to/cuda-9.1 --enable-g=all --enable-fast=none”. MVAPICH2 versions tested were 2.3rc2 and trunk. Nodes are standard linux, something in the RHEL family.

Let me know if I can provide any additional details.

Cheers,
Tom

--

Tom Benson, Ph.D.
Computer Scientist
Lawrence Livermore National Laboratory
benson31 at llnl.gov

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 12495 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180622/49bb77f6/attachment.bin>


More information about the mvapich-discuss mailing list