[mvapich-discuss] Mutex in mv2_get_path_rec_sl

Alexander Melnikov alex.i.melnikov at gmail.com
Thu Feb 13 00:16:38 EST 2020


The mv2_get_path_rec_sl function can be called simultaneously from two
different threads (main and cm_completion_handler), therefore it is
necessary to use a mutex in it to access shared resources (SL cache,
read/write buffer).
Call chains that lead to conflicts:
-
PMPI_Init->MPIR_Init_thread->MPID_Init->MPIDI_CH3_Init->MPIDI_CH3I_CM_Init->MPIDI_CH3I_Exchange_Init_Info->MPIDI_CH3I_Ring_Exchange_Init_Info->rdma_setup_startup_ring->mv2_get_path_rec_sl
- cm_completion_handler->cm_handle_msg->cm_qp_move_to_rtr->mv2_get_path_rec_sl

In the MPI application we will see the following messages:
- No response from SA
And in the opensm log the following message may appear:
osm_vendor_send: ERR 5430: Send p_madw = 0x7f5ec8001c30 of size 120, Class
0x3, Method 0x81, Attr 0x35, TID 0x234103ca1200efbe failed -5 (Invalid
argument)

The simplest solution is to use mutex as in the attached patch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200213/0473869e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mvapich2-2.3.2-sl.patch
Type: text/x-patch
Size: 1343 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200213/0473869e/attachment.bin>


More information about the mvapich-discuss mailing list