[mvapich-discuss] Mutex in mv2_get_path_rec_sl

Subramoni, Hari subramoni.1 at osu.edu
Thu Feb 13 00:30:44 EST 2020


Hi, Alex.

Thanks for finding this out and providing the patch. We appreciate it. We will take it in with an acknowledgement to you. It should be available with the next release of MVAPICH2.

Best,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of Alexander Melnikov
Sent: Thursday, February 13, 2020 12:17 AM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] Mutex in mv2_get_path_rec_sl

The mv2_get_path_rec_sl function can be called simultaneously from two different threads (main and cm_completion_handler), therefore it is necessary to use a mutex in it to access shared resources (SL cache, read/write buffer).
Call chains that lead to conflicts:
- PMPI_Init->MPIR_Init_thread->MPID_Init->MPIDI_CH3_Init->MPIDI_CH3I_CM_Init->MPIDI_CH3I_Exchange_Init_Info->MPIDI_CH3I_Ring_Exchange_Init_Info->rdma_setup_startup_ring->mv2_get_path_rec_sl
- cm_completion_handler->cm_handle_msg->cm_qp_move_to_rtr->mv2_get_path_rec_sl

In the MPI application we will see the following messages:
- No response from SA
And in the opensm log the following message may appear:
osm_vendor_send: ERR 5430: Send p_madw = 0x7f5ec8001c30 of size 120, Class 0x3, Method 0x81, Attr 0x35, TID 0x234103ca1200efbe failed -5 (Invalid argument)

The simplest solution is to use mutex as in the attached patch.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200213/cec7a128/attachment.html>


More information about the mvapich-discuss mailing list