[Mvapich-discuss] [MVAPICH2-2.3.7] Deadlock Issue with MV2_USE_BLOCKING in MVAPICH2-2.3.7
Panda, Dhabaleswar
panda at cse.ohio-state.edu
Mon Oct 6 19:02:31 EDT 2025
Hi Purum,
We have tested your patch. It works. We have added this patch to the MVAPICH2 2.3.7. The updated tarball (mvapich2-2.3.7-2) is available from the MVAPICH download page.
Thanks a lot for identifying the issue and contributing the patch!!
DK
________________________________________
From: Mvapich-discuss <mvapich-discuss-bounces+panda.2=osu.edu at lists.osu.edu> on behalf of Panda, Dhabaleswar via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Tuesday, September 9, 2025 5:32 AM
To: Announcement about MVAPICH (MPIoverInfiniBand, RoCE, Omni-Path, Slingshot,iWARP and EFA) Librariesdeveloped atNBCL/OSU; purum5548 at konkuk.ac.kr
Cc: 진현욱(Hyun-Wook Jin); 이종빈
Subject: Re: [Mvapich-discuss] [MVAPICH2-2.3.7] Deadlock Issue with MV2_USE_BLOCKING in MVAPICH2-2.3.7
Hi Purum,
Thanks for reporting this issue with the testing methodology and the patch. We will test it out.
Please note that MVAPICH2 2.3.7 version is getting old. The latest is the 4.x series. Please start using the latest versions.
Thanks,
DK
________________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of 서푸름 via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Tuesday, September 9, 2025 2:42 AM
To: Announcement about MVAPICH (MPIoverInfiniBand, RoCE, Omni-Path, Slingshot,iWARP and EFA) Librariesdeveloped atNBCL/OSU
Cc: 진현욱(Hyun-Wook Jin); 이종빈
Subject: [Mvapich-discuss] [MVAPICH2-2.3.7] Deadlock Issue with MV2_USE_BLOCKING in MVAPICH2-2.3.7
Dear MVAPICH Team, Hello, I would like to report a deadlock issue related to the MV2_USE_BLOCKING in MVAPICH2 version 2. 3. 7. To help reproduce the issue, I have detailed the environment, test method, and the suspected cause and solution below.
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vOQf0yYNA6YgpRdxXw6FV3I2OFRs6qA_tKNyp9Ld_4spwlb2cwEBP64kNzr4D0Lyhwo6B4WhqLODJK7p6ZCtuHPMDPmrrM82D5vFRDH6C_wpCm2_zMirdP_zHhxLtWljdre8xaqJQcPMGv-c2he8$ >
Report Suspicious
ZjQcmQRYFpfptBannerEnd
Dear MVAPICH Team,
Hello, I would like to report a deadlock issue related to the MV2_USE_BLOCKING in MVAPICH2 version 2.3.7.
To help reproduce the issue, I have detailed the environment, test method, and the suspected cause and solution below.
[Environment]
Homogeneous 2-node setup
Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
CPU : AMD Ryzen Threadripper 2950X (16 - Core Processor)
OS : Kernel 5.15.104, Ubuntu 20.04
MPI : MVAPICH2-2.3.7 (latest release)
[Test Method]
osu-micro-benchmarks-7.5, MPI_IGather() non-blocking benchmark
32 process(16 process on each node)
Increased iteration easily reproduces dead-lock issue
[Reason & Solution]
Suspected Issue: Re-arming of the completion channel is not handled correctly
[Source Code]
Relevant Source File : mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c
Function : static inline int perform_blocking_progress_for_ib(int hca_num, int num_cqs)
Suggested Fix: ibv_req_notify_cq() should be called after acknowledging the completion events
You can view a proposed patch here:
https://urldefense.com/v3/__https://www.diffchecker.com/P4kKplpZ/__;!!KGKeukY!xnfc77G7lCBRwol7u3P3N8BjkQJloFj7PgA_G7u58L3GEPWmxVwrEgPvxdEuUK9-8D0XFi3Rrt1BDAbOu1CoSpQ7yEuDnGM$ <https://urldefense.com/v3/__https://www.diffchecker.com/P4kKplpZ/__;!!KGKeukY!0f3xIInYTO4QfxtJvKP57TtDQLukDIQCvHyE1S-0H6SpgsdKdgNadntujhQnugfVUNeyKiu47y9pBfyEwxjsgjLHGkGRATouEg$>
Thank you for your support.
Best regards,
purum.
_______________________________________________
Mvapich-discuss mailing list
Mvapich-discuss at lists.osu.edu
https://lists.osu.edu/mailman/listinfo/mvapich-discuss
More information about the Mvapich-discuss
mailing list