[mvapich-discuss] MVAPICH and CUDA IPC

Kate Clark mclark at nvidia.com
Thu Feb 7 18:29:13 EST 2019


Hi MVAPICH developers,

I’m seeing occasional lock ups (public MVAPICH) or segmentation faults (MVAPICH-GDR) when using MVAPICH for CUDA IPC message exchange within a node, where the send/recv buffers have already been registered for CUDA IPC before the call to MPI, e.g., their memory handle has been already been exchanged between source and destination nodes.  The issue doesn’t seem to arise if the buffers are not registered prior to MPI.

As per the CUDA documentation, a given memory handle can only be opened once per context per device:

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g01050a29fefde385b1042081ada4cde9


  *   cudaIpcMemHandles from each device in a given process may only be opened by one context per device per other process.

For CUDA IPC, I was wondering does MVAPICH check if a given buffer has already had its message handle opened and reuse this, as opposed to potentially failing?  If this is not the case, could this situation be improved to make it more robust?  For example, checking if a given memory handle has already been opened, and if so, reusing it.  Similarly, if a handle is marked as being opened by the calling application, defer the closing of the memory handle to the calling application as well.

Thanks for your continued development of the MVAPICH library ☺

Kate.

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190207/45f4ffb7/attachment.html>


More information about the mvapich-discuss mailing list