[mvapich-discuss] MVAPICH2 2.3rc2 XRC bug fix details

Liwei Peng liwei.peng at gmail.com
Mon May 7 13:37:07 EDT 2018


Hi Hari,

Thanks for the quick fix and detailed answers on XRC and DCT.

I tried your patch with MVAPICH2 2.3rc2 . With MV2_USE_RoCE=1 and
MV2_USE_XRC=1, the MPI program can run smoothly now.

However, I found when XRC is used, the number of connections didn't drop
but go up. Without XRC, it used 32 QPs. With XRC, it used 40 QPs. Can you
take a look on what could be wrong? Thanks.

The following is what I tried

  ./mpirun_rsh -ssh -np 8 -hostfile ~/m2.txt MV2_USE_RoCE=1
MV2_DEFAULT_GID_INDEX=3 MV2_USE_XRC=0
/home/liwei.peng/mvapich2-2.3rc2-prod/libexec/osu-micro-benchmarks/mpi/collective/osu_alltoall
-i 1000
    number of QPs = 32

  ./mpirun_rsh -ssh -np 8 -hostfile ~/m2.txt MV2_USE_RoCE=1
MV2_DEFAULT_GID_INDEX=3 MV2_USE_XRC=1
/home/liwei.peng/mvapich2-2.3rc2-prod/libexec/osu-micro-benchmarks/mpi/collective/osu_alltoall
-i 1000
    number of QPs=40

  m2.txt has 2 machines:
    10.65.248.131
    10.65.248.151



On Sun, May 6, 2018 at 5:52 PM, Subramoni, Hari <subramoni.1 at osu.edu> wrote:

> Hi, Liewi.
>
>
>
> XRC and RoCE does indeed work. There was a small bug on the MVAPICH2 side
> that was leading to the issue. Can you please apply the following patch on
> top of MVAPICH2 2.3rc2 and see if it solves the issues you observe? Note
> that you need to run after setting MV2_USE_RoCE=1 and MV2_USE_XRC=1 for
> MVAPICH2 to use XRC in RoCE mode. The environment variables are case
> sensitive.
>
>
>
> The following publication gives the details of how MVAPICH2 uses XRC.
>
>
>
> https://ieeexplore.ieee.org/document/4663773/
>
>
>
> For scalability, I would recommend using our hybrid communication channel
> which uses UD and RC/XRC in appropriate combinations. This should give you
> better scalability than just using XRC or RC alone. The following section
> of the MVAPICH2 userguide has more information on this.
>
>
>
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/
> mvapich2-2.3rc2-userguide.html#x1-700006.11
>
>
>
> However, if your system has the software/drivers and HCAs capable of
> supporting DCT, then I would recommend that you use MVAPICH2-X library
> which has support for DCT. The hybrid communication channel in MVAPICH2-X
> uses UD and DC/RC/XRC in appropriate combinations. The following section of
> the MVAPICH2-X userguide has more information on the DC feature in
> MVAPICH2-X and how to enable it.
>
>
>
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/
> mvapich2-x-2.3b-userguide.html#x1-370008.1
>
>
>
> You can download the MVAPICH2-X package from our download page available
> at the following link. If you do not see the RPM package appropriate to
> your system setup (Mellanox OFED, Distro, Compiler), please e-mail us and
> we can generate the appropriate package for you.
>
>
>
> http://mvapich.cse.ohio-state.edu/downloads/
>
>
>
> Please let us know if you have any follow-up questions.
>
>
>
> diff --git a/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_init.c
> b/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_init.c
>
> index 7ef1d8a..5fcb8a7 100644
>
> --- a/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_init.c
>
> +++ b/src/mpid/ch3/channels/mrail/src/gen2/rdma_iba_init.c
>
> @@ -1400,6 +1400,11 @@ int MPIDI_CH3I_PMI_Get_Init_Info(MPIDI_PG_t * pg,
> int tgt_rank,
>
>              &(pg->ch.mrail->cm_shmem.ud_cm[tgt_rank].arch_hca_type),
>
>              &hostid, &(pg->ch.mrail->cm_shmem.ud_
> cm[tgt_rank].cm_gid.global.subnet_prefix),
>
>              &(pg->ch.mrail->cm_shmem.ud_cm[tgt_rank].cm_gid.global.
> interface_id));
>
> +#ifdef _ENABLE_XRC_
>
> +        if (USE_XRC) {
>
> +            pg->ch.mrail->cm_shmem.ud_cm[tgt_rank].xrc_hostid = hostid;
>
> +       }
>
> +#endif /* _ENABLE_XRC_ */
>
>      } else {
>
> #ifdef _ENABLE_XRC_
>
>          if (USE_XRC) {
>
>
>
> Thx,
>
> Hari.
>
>
>
> *From:* mvapich-discuss-bounces at cse.ohio-state.edu *On Behalf Of *Liwei
> Peng
> *Sent:* Tuesday, May 1, 2018 10:54 PM
> *To:* Subramoni, Hari <subramoni.1 at osu.edu>
> *Cc:* mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.
> ohio-state.edu>
> *Subject:* Re: [mvapich-discuss] MVAPICH2 2.3rc2 XRC bug fix details
>
>
>
> Thanks Hari for the quick response.
>
>
>
> A new more questions:
>
> 1) Can you share some details on how MVAPICH2's XRC feature used on IB?
>
> 2) Our purpose is to reduce queue pair connections on large-scale cluster
> environment. Besides XRC/DCT, what other technologies can we leverage?
>
>
>
> Thanks,
>
>
>
> Liwei
>
>
>
>
>
>
>
> On Tue, May 1, 2018 at 10:17 AM, Subramoni, Hari <subramoni.1 at osu.edu>
> wrote:
>
> Hi, Liwei.
>
>
>
> This is different from the issue that was reported on mvapich-discuss a
> few days ago. We are still trying to investigate whether this combination
> is expected to work. We will get back to you soon with more details.
>
>
>
> Thx,
>
> Hari.
>
>
>
> *From:* mvapich-discuss-bounces at cse.ohio-state.edu *On Behalf Of *Liwei
> Peng
> *Sent:* Tuesday, May 1, 2018 12:09 PM
> *To:* mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.
> ohio-state.edu>
> *Subject:* [mvapich-discuss] MVAPICH2 2.3rc2 XRC bug fix details
>
>
>
> Hi Mvapich expert,
>
>
>
> I am evaluating the latest MVAPICH2 2.3rc2 release. From the changelog, it
> has a bug fix " Fix issue with XRC connection establishment". Can you
> provide more details on this bug fix?
>
>
>
> When I used MVAPICH2 2.3 rc1 on Mellanox ConnectX4 RoCE last week, I found
> XRC will cause MPI to hang. Is the bug fix related to this issue?
>
>
>
> Thanks,
>
>
>
> Liwei
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180507/4e192924/attachment-0001.html>


More information about the mvapich-discuss mailing list