[mvapich-discuss] MVAPICH2 2.3rc2 XRC bug fix details

Subramoni, Hari subramoni.1 at osu.edu
Sun May 6 20:52:12 EDT 2018


Hi, Liewi.

XRC and RoCE does indeed work. There was a small bug on the MVAPICH2 side that was leading to the issue. Can you please apply the following patch on top of MVAPICH2 2.3rc2 and see if it solves the issues you observe? Note that you need to run after setting MV2_USE_RoCE=1 and MV2_USE_XRC=1 for MVAPICH2 to use XRC in RoCE mode. The environment variables are case sensitive.

The following publication gives the details of how MVAPICH2 uses XRC.

https://ieeexplore.ieee.org/document/4663773/

For scalability, I would recommend using our hybrid communication channel which uses UD and RC/XRC in appropriate combinations. This should give you better scalability than just using XRC or RC alone. The following section of the MVAPICH2 userguide has more information on this.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3rc2-userguide.html#x1-700006.11

However, if your system has the software/drivers and HCAs capable of supporting DCT, then I would recommend that you use MVAPICH2-X library which has support for DCT. The hybrid communication channel in MVAPICH2-X uses UD and DC/RC/XRC in appropriate combinations. The following section of the MVAPICH2-X userguide has more information on the DC feature in MVAPICH2-X and how to enable it.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-x-2.3b-userguide.html#x1-370008.1

You can download the MVAPICH2-X package from our download page available at the following link. If you do not see the RPM package appropriate to your system setup (Mellanox OFED, Distro, Compiler), please e-mail us and we can generate the appropriate package for you.

http://mvapich.cse.ohio-state.edu/downloads/

Please let us know if you have any follow-up questions.

diff --git a/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_init.c b/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_init.c
index 7ef1d8a..5fcb8a7 100644
--- a/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_init.c
+++ b/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_init.c
@@ -1400,6 +1400,11 @@ int MPIDI_CH3I_PMI_Get_Init_Info(MPIDI_PG_t * pg, int tgt_rank,
             &(pg->ch.mrail->cm_shmem.ud_cm[tgt_rank].arch_hca_type),
             &hostid, &(pg->ch.mrail->cm_shmem.ud_cm[tgt_rank].cm_gid.global.subnet_prefix),
             &(pg->ch.mrail->cm_shmem.ud_cm[tgt_rank].cm_gid.global.interface_id));
+#ifdef _ENABLE_XRC_
+        if (USE_XRC) {
+            pg->ch.mrail->cm_shmem.ud_cm[tgt_rank].xrc_hostid = hostid;
+       }
+#endif /* _ENABLE_XRC_ */
     } else {
#ifdef _ENABLE_XRC_
         if (USE_XRC) {

Thx,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu On Behalf Of Liwei Peng
Sent: Tuesday, May 1, 2018 10:54 PM
To: Subramoni, Hari <subramoni.1 at osu.edu>
Cc: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: Re: [mvapich-discuss] MVAPICH2 2.3rc2 XRC bug fix details

Thanks Hari for the quick response.

A new more questions:
1) Can you share some details on how MVAPICH2's XRC feature used on IB?
2) Our purpose is to reduce queue pair connections on large-scale cluster environment. Besides XRC/DCT, what other technologies can we leverage?

Thanks,

Liwei



On Tue, May 1, 2018 at 10:17 AM, Subramoni, Hari <subramoni.1 at osu.edu<mailto:subramoni.1 at osu.edu>> wrote:
Hi, Liwei.

This is different from the issue that was reported on mvapich-discuss a few days ago. We are still trying to investigate whether this combination is expected to work. We will get back to you soon with more details.

Thx,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu<mailto:mvapich-discuss-bounces at cse.ohio-state.edu> On Behalf Of Liwei Peng
Sent: Tuesday, May 1, 2018 12:09 PM
To: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu> <mvapich-discuss at mailman.cse.ohio-state.edu<mailto:mvapich-discuss at mailman.cse.ohio-state.edu>>
Subject: [mvapich-discuss] MVAPICH2 2.3rc2 XRC bug fix details

Hi Mvapich expert,

I am evaluating the latest MVAPICH2 2.3rc2 release. From the changelog, it has a bug fix " Fix issue with XRC connection establishment". Can you provide more details on this bug fix?

When I used MVAPICH2 2.3 rc1 on Mellanox ConnectX4 RoCE last week, I found XRC will cause MPI to hang. Is the bug fix related to this issue?

Thanks,

Liwei


-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 18790 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180507/15ec7141/attachment-0001.bin>


More information about the mvapich-discuss mailing list