[mvapich-discuss] mvapich2-2.3.3 over connectX-5 regression issue
Honggang LI
honli at redhat.com
Fri Apr 10 23:44:44 EDT 2020
On Sat, Apr 11, 2020 at 12:26:52AM +0000, Subramoni, Hari wrote:
> Hi, Honggang.
>
> Glad to know that it works for you. I am still trying to understand how changing the launcher changes the IB HCA selection behavior in MVAPICH2. To the best of my knowledge, the two does not have any interaction.
>
> If you don't mind, can you let us know the following
>
> 1. output of ibstat on both nodes
[root at rdma-virt-02 ~]$ ibstat
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0xe41d2d0300e70ff7
System image GUID: 0xe41d2d0300e70ff6
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 38
LMC: 0
SM lid: 1
Capability mask: 0x2659e848
Port GUID: 0xe41d2d0300e70ff7
Link layer: InfiniBand
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0xe41d2d0300e70ff6
System image GUID: 0xe41d2d0300e70ff6
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 19
LMC: 0
SM lid: 13
Capability mask: 0x2659e848
Port GUID: 0xe41d2d0300e70ff6
Link layer: InfiniBand
CA 'mlx5_bond_0'
CA type: MT4117
Number of ports: 1
Firmware version: 14.25.1020
Hardware version: 0
Node GUID: 0xe41d2d0300fda72a
System image GUID: 0xe41d2d0300fda72a
Port 1:
State: Active
Physical state: LinkUp
Rate: 25
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0xe61d2dfffefda72a
Link layer: Ethernet
[root at rdma-virt-02 ~]$
[root at rdma-virt-02 ~]$
[root at rdma-virt-02 ~]$
[root at rdma-virt-02 ~]$ ssh rdma-virt-03 ibstat
CA 'mlx5_bond_0'
CA type: MT4117
Number of ports: 1
Firmware version: 14.25.1020
Hardware version: 0
Node GUID: 0xe41d2d0300fda736
System image GUID: 0xe41d2d0300fda736
Port 1:
State: Active
Physical state: LinkUp
Rate: 25
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0xe61d2dfffefda736
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0xe41d2d0300e70e87
System image GUID: 0xe41d2d0300e70e86
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 30
LMC: 0
SM lid: 1
Capability mask: 0x2659e848
Port GUID: 0xe41d2d0300e70e87
Link layer: InfiniBand
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0xe41d2d0300e70e86
System image GUID: 0xe41d2d0300e70e86
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 20
LMC: 0
SM lid: 13
Capability mask: 0x2659e848
Port GUID: 0xe41d2d0300e70e86
Link layer: InfiniBand
> 2. what do you mean by IPoIB was configured on mlx5_0?
172.31.0.202 is the IPoIB address of the port of mlx5_0. See the
hardware address and port GUID.
[root at rdma-virt-02 ~]$ ip addr show mlx5_ib0
8: mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
link/infiniband 00:00:0b:ae:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:e7:0f:f6 brd
^^^^^^^^
inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic noprefixroute mlx5_ib0
^^^^^^^^^^^^^
[root at rdma-virt-02 ~]$ ibstat mlx5_0
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.25.1020
Hardware version: 0
Node GUID: 0xe41d2d0300e70ff6
System image GUID: 0xe41d2d0300e70ff6
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 19
LMC: 0
SM lid: 13
Capability mask: 0x2659e848
Port GUID: 0xe41d2d0300e70ff6
^^^^^^^
Link layer: InfiniBand
>
> Thx,
> Hari.
>
> -----Original Message-----
> From: Honggang LI <honli at redhat.com>
> Sent: Friday, April 10, 2020 7:35 PM
> To: Subramoni, Hari <subramoni.1 at osu.edu>
> Cc: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
> Subject: Re: [mvapich-discuss] mvapich2-2.3.3 over connectX-5 regression issue
>
> On Fri, Apr 10, 2020 at 01:52:20PM +0000, Subramoni, Hari wrote:
> > Hi, Honggang.
> >
> > It looks like your systems have multiple network adapters that have been setup with different modes (IB and Ethernet). In such a scenario, I would recommend explicitly setting the network adapter you want MVAPICH2 to use.
> >
> > e.g. MV2_IBA_HCA=mlx5_0 or MV2_IBA_HCA=mlx5_1
>
> MV2_IBA_HCA=mlx5_1 works for mpirun and mpirun_rsh. The IPoIB had been configured on mlx5_0. It seems mpirun and mpirun_rsh blindly pick up the first HCA port.
>
> The workaround works, but it is still a regression issue because 2.3.2 does not need the workaround.
>
> Thanks
>
> [root at rdma-virt-02 ~]$ rpm -qf /usr/lib64/mvapich2/bin/mpirun
> mvapich2-2.3.3-1.el8.x86_64
>
> [root at rdma-virt-02 ~]$ cat hfile_one_core
> 172.31.0.202
> 172.31.0.203
>
> [root at rdma-virt-02 ~]$ ip addr show | grep -w 172.31.0.202
> inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic noprefixroute mlx5_ib0
>
> [root at rdma-virt-02 ~]$ ip addr show mlx5_ib0
> 8: mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
> link/infiniband 00:00:0b:ae:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:e7:0f:f6 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic noprefixroute mlx5_ib0
> valid_lft 2039sec preferred_lft 2039sec
> inet6 fe80::e61d:2d03:e7:ff6/64 scope link noprefixroute
> valid_lft forever preferred_lft forever
>
> [root at rdma-virt-02 ~]$ ibstat mlx5_0
> CA 'mlx5_0'
> CA type: MT4115
> Number of ports: 1
> Firmware version: 12.25.1020
> Hardware version: 0
> Node GUID: 0xe41d2d0300e70ff6
> System image GUID: 0xe41d2d0300e70ff6
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 56
> Base lid: 19
> LMC: 0
> SM lid: 13
> Capability mask: 0x2659e848
> Port GUID: 0xe41d2d0300e70ff6 <===
> Link layer: InfiniBand
>
> According to the "link/infiniband" hardware address and port GUID. The HCA is mlx5_0.
>
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -genv MV2_IBA_HCA=mlx5_0 -np 2 -hostfile /root/hfile_one_core hostname rdma-virt-02.lab.bos.redhat.com rdma-virt-03.lab.bos.redhat.com
>
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -genv MV2_IBA_HCA=mlx5_0 -np 2 -hostfile /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> (hang on like mpirun_rsh, no output)
>
>
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -genv MV2_IBA_HCA=mlx5_1 -np 2 -hostfile /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> # OSU MPI Latency Test v5.4.1
> # Size Latency (us)
> 0 1.03
> 1 1.08
> 2 1.07
> 4 1.07
> 8 1.07
> 16 1.11
> 32 1.11
> 64 1.13
> 128 1.19
> 256 1.59
> 512 1.68
> 1024 1.84
> 2048 2.19
> 4096 2.95
> 8192 4.40
> 16384 5.57
> 32768 7.21
> 65536 9.89
> 131072 15.24
> 262144 26.00
> 524288 47.65
> 1048576 91.39
> 2097152 177.73
> 4194304 351.36
>
> After add 'export MV2_IBA_HCA=mlx5_1' in the ~/.bashrc on both machine, mpirun_rsh works.
>
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun_rsh -np 2 -hostfile /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> # OSU MPI Latency Test v5.4.1
> # Size Latency (us)
> 0 1.04
> 1 1.07
> 2 1.07
> 4 1.07
> 8 1.06
> 16 1.12
> 32 1.12
> 64 1.14
> 128 1.19
> 256 1.58
> 512 1.66
> 1024 1.82
> 2048 2.17
> 4096 2.93
> 8192 4.39
> 16384 5.52
> 32768 7.18
> 65536 9.87
> 131072 15.25
> 262144 25.97
> 524288 47.62
> 1048576 91.80
> 2097152 177.88
> 4194304 350.79
>
>
> > Best,
> > Hari.
> >
> > -----Original Message-----
> > From: mvapich-discuss-bounces at cse.ohio-state.edu
> > <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of
> > Honggang LI
> > Sent: Friday, April 10, 2020 3:56 AM
> > To: mvapich-discuss at cse.ohio-state.edu
> > <mvapich-discuss at mailman.cse.ohio-state.edu>
> > Subject: [mvapich-discuss] mvapich2-2.3.3 over connectX-5 regression
> > issue
> >
> > hi
> >
> > short summary:
> > +----------+----------+-----------+
> > |mvapich2 | mpirun | mpirun_rsh|
> > |version | | |
> > +----------+----------+-----------+
> > |2.3.2 | works | hang |
> > +----------+----------+-----------+
> > |2.3.3 | failed | hang |
> > +----------+----------+-----------+
> >
> > Is it possible to run something like 'git bisect' to narrow down the source of regression issue? It seems no git repo available for public.
> > I don't know how to run 'git bisect' with the SVN repo.
> >
> > thanks
> >
> > [root at rdma-virt-02 ~]$ cat hfile_one_core
> > 172.31.0.202
> > 172.31.0.203
> > [root at rdma-virt-02 ~]$ ip addr show | grep -w 172.31.0.202
> > inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic
> > noprefixroute mlx5_ib0
> >
> >
> > [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -np 2 -hostfile
> > /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> > [rdma-virt-02.lab.bos.redhat.com:mpi_rank_0][handle_cqe] Send desc
> > error in msg to 1, wc_opcode=0
> > [rdma-virt-02.lab.bos.redhat.com:mpi_rank_0][handle_cqe] Msg from 1:
> > wc.status=12, wc.wr_id=0x560c8bac9040, wc.opcode=0,
> > vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND
> > [rdma-virt-02.lab.bos.redhat.com:mpi_rank_0][handle_cqe]
> > src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:548: [] Got
> > completion with error 12, vendor code=0x81, dest rank=1
> > : Protocol not supported (93)
> > [rdma-virt-03.lab.bos.redhat.com:mpi_rank_1][handle_cqe] Send desc
> > error in msg to 0, wc_opcode=0
> > [rdma-virt-03.lab.bos.redhat.com:mpi_rank_1][handle_cqe] Msg from 0:
> > wc.status=12, wc.wr_id=0x563896cf9040, wc.opcode=0,
> > vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND
> > [rdma-virt-03.lab.bos.redhat.com:mpi_rank_1][handle_cqe]
> > src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:548: [] Got
> > completion with error 12, vendor code=0x81, dest rank=0
> > : Protocol not supported (93)
> >
> > [root at rdma-virt-02 ~]$ dnf downgrade mvapich2 Updating Subscription Management repositories.
> > Unable to read consumer identity
> > This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
> > Last metadata expiration check: 2:24:17 ago on Fri 10 Apr 2020 01:18:01 AM EDT.
> > Dependencies resolved.
> > ========================================================================================================================================
> > Package Architecture Version Repository Size
> > ======================================================================
> > ==================================================================
> > Downgrading:
> > mvapich2 x86_64 2.3.2-2.el8 beaker-AppStream 3.1 M
> >
> > Transaction Summary
> > ======================================================================
> > ==================================================================
> > Downgrade 1 Package
> >
> > Total download size: 3.1 M
> > Is this ok [y/N]: y
> > Downloading Packages:
> > mvapich2-2.3.2-2.el8.x86_64.rpm 39 MB/s | 3.1 MB 00:00
> > ----------------------------------------------------------------------------------------------------------------------------------------
> > Total 39 MB/s | 3.1 MB 00:00
> > Running transaction check
> > Transaction check succeeded.
> > Running transaction test
> > Transaction test succeeded.
> > Running transaction
> > Preparing : 1/1
> > Downgrading : mvapich2-2.3.2-2.el8.x86_64 1/2
> > Cleanup : mvapich2-2.3.3-1.el8.x86_64 2/2
> > Running scriptlet: mvapich2-2.3.3-1.el8.x86_64 2/2
> > Verifying : mvapich2-2.3.2-2.el8.x86_64 1/2
> > Verifying : mvapich2-2.3.3-1.el8.x86_64 2/2
> > Installed products updated.
> >
> > Downgraded:
> > mvapich2-2.3.2-2.el8.x86_64
> >
> > Complete!
> > [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -np 2 -hostfile
> > /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> > # OSU MPI Latency Test v5.4.1
> > # Size Latency (us)
> > 0 1.24
> > 1 1.29
> > 2 1.29
> > 4 1.29
> > 8 1.29
> > 16 1.34
> > 32 1.35
> > 64 1.36
> > 128 1.42
> > 256 1.82
> > 512 1.92
> > 1024 2.11
> > 2048 2.53
> > 4096 3.48
> > 8192 5.19
> > 16384 7.37
> > 32768 10.12
> > 65536 15.00
> > 131072 24.69
> > 262144 44.15
> > 524288 82.97
> > 1048576 160.92
> > 2097152 316.19
> > 4194304 626.91
> > [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun_rsh -np 2
> > -hostfile /root/hfile_one_core
> > /usr/lib64/mvapich2/bin/mpitests-osu_latency
> >
> > (hang on, no output)
> >
> > [root at rdma-virt-03 ~]$ ibstat
> > CA 'mlx5_bond_0'
> > CA type: MT4117
> > Number of ports: 1
> > Firmware version: 14.25.1020
> > Hardware version: 0
> > Node GUID: 0xe41d2d0300fda736
> > System image GUID: 0xe41d2d0300fda736
> > Port 1:
> > State: Active
> > Physical state: LinkUp
> > Rate: 25
> > Base lid: 0
> > LMC: 0
> > SM lid: 0
> > Capability mask: 0x00010000
> > Port GUID: 0xe61d2dfffefda736
> > Link layer: Ethernet
> > CA 'mlx5_1'
> > CA type: MT4115
> > Number of ports: 1
> > Firmware version: 12.25.1020
> > Hardware version: 0
> > Node GUID: 0xe41d2d0300e70e87
> > System image GUID: 0xe41d2d0300e70e86
> > Port 1:
> > State: Active
> > Physical state: LinkUp
> > Rate: 100
> > Base lid: 30
> > LMC: 0
> > SM lid: 1
> > Capability mask: 0x2659e848
> > Port GUID: 0xe41d2d0300e70e87
> > Link layer: InfiniBand
> > CA 'mlx5_0'
> > CA type: MT4115
> > Number of ports: 1
> > Firmware version: 12.25.1020
> > Hardware version: 0
> > Node GUID: 0xe41d2d0300e70e86
> > System image GUID: 0xe41d2d0300e70e86
> > Port 1:
> > State: Active
> > Physical state: LinkUp
> > Rate: 100
> > Base lid: 20
> > LMC: 0
> > SM lid: 13
> > Capability mask: 0x2659e848
> > Port GUID: 0xe41d2d0300e70e86
> > Link layer: InfiniBand
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
More information about the mvapich-discuss
mailing list