[Mvapich-discuss] [MVAPICH2-2.3.7] Deadlock Issue with MV2_USE_BLOCKING in MVAPICH2-2.3.7

서푸름 purum5548 at konkuk.ac.kr
Tue Sep 9 02:42:02 EDT 2025


Dear MVAPICH Team,
Hello, I would like to report a deadlock issue related to the MV2_USE_BLOCKING in MVAPICH2 version 2.3.7.
To help reproduce the issue, I have detailed the environment, test method, and the suspected cause and solution below.

[Environment]
Homogeneous 2-node setup
Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
CPU : AMD Ryzen Threadripper 2950X (16 - Core Processor)
OS : Kernel 5.15.104, Ubuntu 20.04
MPI : MVAPICH2-2.3.7 (latest release)

[Test Method]
osu-micro-benchmarks-7.5, MPI_IGather() non-blocking benchmark
32 process(16 process on each node)
Increased iteration easily reproduces dead-lock issue

[Reason & Solution]
Suspected Issue: Re-arming of the completion channel is not handled correctly

[Source Code]
Relevant Source File : mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c
Function : static inline int perform_blocking_progress_for_ib(int hca_num, int num_cqs)
Suggested Fix: ibv_req_notify_cq() should be called after acknowledging the completion events

You can view a proposed patch here:
https://urldefense.com/v3/__https://www.diffchecker.com/P4kKplpZ/__;!!KGKeukY!0f3xIInYTO4QfxtJvKP57TtDQLukDIQCvHyE1S-0H6SpgsdKdgNadntujhQnugfVUNeyKiu47y9pBfyEwxjsgjLHGkGRATouEg$ 

Thank you for your support.
Best regards,
purum.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20250909/c98a8293/attachment-0001.html>


More information about the Mvapich-discuss mailing list