[mvapich-discuss] Segfault w/ GPUDirect MPISend fired from CUDA Callback (SMP)

Paul Sathre sath6220 at cs.vt.edu
Fri Feb 13 17:11:48 EST 2015


Thanks for the quick reply, Khaled,

1) I will work on isolating a minimal test case next week.
2) No, would it be safe to have the callback function call Init_thread
immediately before the ISend? (Much later than the global MPI_Init at the
start of the program). Can i similarly MPI_finalize_thread in the callback
function?
3) No, the main thread is actually blocking on a return from
cudaThreadSynchronize to force the pack kernel to finish (because CUDA is
blocking on the return of the segfaulting callback function.)
4) The thread is created by the CUDA runtime, I am unsure whether or not it
has the same context, but would lean towards thinking the CUDA runtime
smart enough to ensure it does.
5) I am unsure whether they are on the same socket, and capable of peer
access, I will have to check. (I did check with MV2_CUDA_IPC=0 and =1
though, and both had segfaults.)
6) I was under the impression MPI_pack was for packing custom data types,
whereas we are packing arbitrary regions of a multi-dimensional grid, based
on arrays of user-specified offset/contig_length pairs. Am I incorrect and
there's a lightweight way to achieve this through custom data types? Also
we seamlessly interchange between OpenCL, CUDA, and OpenMP backends, so we
are still stuck implementing an OpenCL pack kernel for transparent
execution. However its callback chain isn't subject to this segfault, since
we have to host-stage the buffer before MPI transfer anyway.. That is,
unless you have a GPUDirect equivalent for OpenCL that I'm unaware of -
which we'd be *very* interested in =)

Thanks again!

-Paul Sathre
Research Programmer - Synergy Lab
Dept. of Computer Science
Virginia Tech

On Fri, Feb 13, 2015 at 4:44 PM, khaled hamidouche <
hamidouc at cse.ohio-state.edu> wrote:

> Hi Paul,
>
> In order to help debugging this issue, can you please provide us some more
> information :
>
> 1) Can we have a reproducer of this issue, this will help us debugging the
> issue faster
> 2) Does your example use MPI_Init_thread? Your scenario belongs to the
> Multiple thread, so MPI needs to be aware of it.
> 3) Are the Thread and the main thread accessing the same buffer at the
> same time ?
> 4) How the thread is created ? Is the thread created with the same context
> than the processes ?
> 5) In your system configuration, is both the GPUs on same socket (i.e :
> can the IPC be used ?). If yes enabling IPC does it reach the same issue
> (seg fault at memcopy ?)
> 6) What is the exact use case of this, in other words why MPI_pack (the
> MVAPICH2 Kernel) is not sufficient ?
>
>
> Thanks a lot
>
> On Fri, Feb 13, 2015 at 2:30 PM, Paul Sathre <sath6220 at cs.vt.edu> wrote:
>
>> Hi,
>>
>> I am constructing a library which requires fully asynchronous "pack and
>> send" functionality, with a custom pack kernel, and (hopefully) a GPUDirect
>> send. Therefore I have setup a pipeline via CUDA's callback mechanism, such
>> that when the custom pack kernel completes asynchronously, the CUDA runtime
>> automatically triggers a small function which launches an MPISend of the
>> packed device buffer, and stores the request for the user application to
>> test later. We are currently only testing intra-node exchanges via SMP.
>>
>> However this segfaults with the following backtrace (for eager protocol,
>> rendezvous similarly fails on a __memcpy_sse2_unaligned.)
>>
>> #0  __memcpy_sse2_unaligned ()
>>     at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:37
>> #1  0x00007f0abf8c13d7 in MPIDI_CH3I_SMP_writev ()
>>    from /home/psath/mvapich2-2.1rc1/build/install/lib/libmpi.so.12
>> #2  0x00007f0abf8b6026 in MPIDI_CH3_iSendv ()
>>    from /home/psath/mvapich2-2.1rc1/build/install/lib/libmpi.so.12
>> #3  0x00007f0abf8a4c87 in MPIDI_CH3_EagerContigIsend ()
>>    from /home/psath/mvapich2-2.1rc1/build/install/lib/libmpi.so.12
>> #4  0x00007f0abf8ab9c1 in MPID_Isend ()
>>    from /home/psath/mvapich2-2.1rc1/build/install/lib/libmpi.so.12
>> #5  0x00007f0abf83272d in PMPI_Isend ()
>>    from /home/psath/mvapich2-2.1rc1/build/install/lib/libmpi.so.12
>> #6  0x00007f0abfc79a1f in cuda_sap_isend_cb (stream=0x0,
>> status=cudaSuccess,
>>     data=0xb52d70) at metamorph_mpi.c:435
>>
>> I am successfully able to transfer the same device buffer from the
>> primary thread of the application, but when the MPISend is launched from
>> the third thread (launched by the CUDA driver, which invokes the callback
>> function) it seems to not understand that it is still a device pointer and
>> cannot be copied with a CPU memcpy.
>>
>> Hao Wang who is currently at our lab suggested explicitly disabling IPC
>> (and separately trying to *enable* SMP_IPC), which I attempted, but
>> didn't help.
>>
>> We are using MVAPICH 2.1rc1
>> The configure line is:
>>
>> ../mvapich2-2.1rc1/configure --prefix=/home/psath/mvapich2-
>> 2.1rc1/build/install --enable-cuda --disable-mcast
>> --with-ib-libpath=/home/psath/libibverbs/install/lib
>> --with-ib-include=/home/psath/libibverbs/install/include
>> --with-libcuda=/usr/local/cuda-6.0/lib64
>> --with-libcudart=/usr/local/cuda-6.0/lib64/
>>
>> The system has 2 K20x GPUs running Nvidia driver 331.67. We are using a
>> userspace build of libibverbs.so v1.1.8-1 from the Debian repos.
>>
>> Have you observed a use case like this before with similar segfaults? Do
>> you have any further suggestions for tests or workarounds that will
>> preserve the GPU direct behavior? (Forcing the callback to stall the
>> transfer and place it on a helper list for the main thread to come back
>> around to would incur additional polling overhead that should not be
>> required, and bends the async model we are trying to implement.)
>>
>>
>> Thanks!
>> -Paul Sathre
>> Research Programmer - Synergy Lab
>> Dept. of Computer Science
>> Virginia Tech
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150213/b9cab625/attachment-0001.html>


More information about the mvapich-discuss mailing list