[mvapich-discuss] mvapich2-2.3.3 over connectX-5 regression issue

Honggang LI honli at redhat.com
Fri Apr 10 23:44:44 EDT 2020


On Sat, Apr 11, 2020 at 12:26:52AM +0000, Subramoni, Hari wrote:
> Hi, Honggang.
> 
> Glad to know that it works for you. I am still trying to understand how changing the launcher changes the IB HCA selection behavior in MVAPICH2. To the best of my knowledge, the two does not have any interaction.
> 
> If you don't mind, can you let us know the following
> 
> 1. output of ibstat on both nodes

[root at rdma-virt-02 ~]$ ibstat
CA 'mlx5_1'
	CA type: MT4115
	Number of ports: 1
	Firmware version: 12.25.1020
	Hardware version: 0
	Node GUID: 0xe41d2d0300e70ff7
	System image GUID: 0xe41d2d0300e70ff6
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 38
		LMC: 0
		SM lid: 1
		Capability mask: 0x2659e848
		Port GUID: 0xe41d2d0300e70ff7
		Link layer: InfiniBand
CA 'mlx5_0'
	CA type: MT4115
	Number of ports: 1
	Firmware version: 12.25.1020
	Hardware version: 0
	Node GUID: 0xe41d2d0300e70ff6
	System image GUID: 0xe41d2d0300e70ff6
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 19
		LMC: 0
		SM lid: 13
		Capability mask: 0x2659e848
		Port GUID: 0xe41d2d0300e70ff6
		Link layer: InfiniBand
CA 'mlx5_bond_0'
	CA type: MT4117
	Number of ports: 1
	Firmware version: 14.25.1020
	Hardware version: 0
	Node GUID: 0xe41d2d0300fda72a
	System image GUID: 0xe41d2d0300fda72a
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 25
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x00010000
		Port GUID: 0xe61d2dfffefda72a
		Link layer: Ethernet
[root at rdma-virt-02 ~]$
[root at rdma-virt-02 ~]$
[root at rdma-virt-02 ~]$
[root at rdma-virt-02 ~]$ ssh rdma-virt-03 ibstat
CA 'mlx5_bond_0'
	CA type: MT4117
	Number of ports: 1
	Firmware version: 14.25.1020
	Hardware version: 0
	Node GUID: 0xe41d2d0300fda736
	System image GUID: 0xe41d2d0300fda736
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 25
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x00010000
		Port GUID: 0xe61d2dfffefda736
		Link layer: Ethernet
CA 'mlx5_1'
	CA type: MT4115
	Number of ports: 1
	Firmware version: 12.25.1020
	Hardware version: 0
	Node GUID: 0xe41d2d0300e70e87
	System image GUID: 0xe41d2d0300e70e86
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 30
		LMC: 0
		SM lid: 1
		Capability mask: 0x2659e848
		Port GUID: 0xe41d2d0300e70e87
		Link layer: InfiniBand
CA 'mlx5_0'
	CA type: MT4115
	Number of ports: 1
	Firmware version: 12.25.1020
	Hardware version: 0
	Node GUID: 0xe41d2d0300e70e86
	System image GUID: 0xe41d2d0300e70e86
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 20
		LMC: 0
		SM lid: 13
		Capability mask: 0x2659e848
		Port GUID: 0xe41d2d0300e70e86
		Link layer: InfiniBand


> 2. what do you mean by IPoIB was configured on mlx5_0?
172.31.0.202 is the IPoIB address of the port of mlx5_0. See the
hardware address and port GUID.

[root at rdma-virt-02 ~]$ ip addr show mlx5_ib0
8: mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:0b:ae:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:e7:0f:f6 brd 
                                                                       ^^^^^^^^
    inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic noprefixroute mlx5_ib0
         ^^^^^^^^^^^^^

[root at rdma-virt-02 ~]$ ibstat mlx5_0
CA 'mlx5_0'
	CA type: MT4115
	Number of ports: 1
	Firmware version: 12.25.1020
	Hardware version: 0
	Node GUID: 0xe41d2d0300e70ff6
	System image GUID: 0xe41d2d0300e70ff6
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 19
		LMC: 0
		SM lid: 13
		Capability mask: 0x2659e848
		Port GUID: 0xe41d2d0300e70ff6
		                      ^^^^^^^
		Link layer: InfiniBand
> 
> Thx,
> Hari.
> 
> -----Original Message-----
> From: Honggang LI <honli at redhat.com> 
> Sent: Friday, April 10, 2020 7:35 PM
> To: Subramoni, Hari <subramoni.1 at osu.edu>
> Cc: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
> Subject: Re: [mvapich-discuss] mvapich2-2.3.3 over connectX-5 regression issue
> 
> On Fri, Apr 10, 2020 at 01:52:20PM +0000, Subramoni, Hari wrote:
> > Hi, Honggang.
> > 
> > It looks like your systems have multiple network adapters that have been setup with different modes (IB and Ethernet). In such a scenario, I would recommend explicitly setting the network adapter you want MVAPICH2 to use.
> > 
> > e.g. MV2_IBA_HCA=mlx5_0 or MV2_IBA_HCA=mlx5_1
> 
> MV2_IBA_HCA=mlx5_1 works for mpirun and mpirun_rsh. The IPoIB had been configured on mlx5_0. It seems mpirun and mpirun_rsh blindly pick up the first HCA port.
> 
> The workaround works, but it is still a regression issue because 2.3.2 does not need the workaround.
> 
> Thanks
> 
> [root at rdma-virt-02 ~]$ rpm -qf /usr/lib64/mvapich2/bin/mpirun
> mvapich2-2.3.3-1.el8.x86_64
> 
> [root at rdma-virt-02 ~]$ cat hfile_one_core
> 172.31.0.202
> 172.31.0.203
> 
> [root at rdma-virt-02 ~]$ ip addr show | grep -w 172.31.0.202
>     inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic noprefixroute mlx5_ib0
> 
> [root at rdma-virt-02 ~]$ ip addr show mlx5_ib0
> 8: mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
>     link/infiniband 00:00:0b:ae:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:e7:0f:f6 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>     inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic noprefixroute mlx5_ib0
>        valid_lft 2039sec preferred_lft 2039sec
>     inet6 fe80::e61d:2d03:e7:ff6/64 scope link noprefixroute 
>        valid_lft forever preferred_lft forever
> 
> [root at rdma-virt-02 ~]$ ibstat mlx5_0
> CA 'mlx5_0'
> 	CA type: MT4115
> 	Number of ports: 1
> 	Firmware version: 12.25.1020
> 	Hardware version: 0
> 	Node GUID: 0xe41d2d0300e70ff6
> 	System image GUID: 0xe41d2d0300e70ff6
> 	Port 1:
> 		State: Active
> 		Physical state: LinkUp
> 		Rate: 56
> 		Base lid: 19
> 		LMC: 0
> 		SM lid: 13
> 		Capability mask: 0x2659e848
> 		Port GUID: 0xe41d2d0300e70ff6  <===
> 		Link layer: InfiniBand
> 
> According to the "link/infiniband" hardware address and port GUID. The HCA is mlx5_0.
> 
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -genv MV2_IBA_HCA=mlx5_0 -np 2 -hostfile /root/hfile_one_core hostname rdma-virt-02.lab.bos.redhat.com rdma-virt-03.lab.bos.redhat.com
> 
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -genv MV2_IBA_HCA=mlx5_0 -np 2 -hostfile /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> (hang on like mpirun_rsh, no output)
> 
> 
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -genv MV2_IBA_HCA=mlx5_1 -np 2 -hostfile /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> # OSU MPI Latency Test v5.4.1
> # Size          Latency (us)
> 0                       1.03
> 1                       1.08
> 2                       1.07
> 4                       1.07
> 8                       1.07
> 16                      1.11
> 32                      1.11
> 64                      1.13
> 128                     1.19
> 256                     1.59
> 512                     1.68
> 1024                    1.84
> 2048                    2.19
> 4096                    2.95
> 8192                    4.40
> 16384                   5.57
> 32768                   7.21
> 65536                   9.89
> 131072                 15.24
> 262144                 26.00
> 524288                 47.65
> 1048576                91.39
> 2097152               177.73
> 4194304               351.36
> 
> After add 'export MV2_IBA_HCA=mlx5_1' in the ~/.bashrc on both machine, mpirun_rsh works.
> 
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun_rsh  -np 2 -hostfile /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> # OSU MPI Latency Test v5.4.1
> # Size          Latency (us)
> 0                       1.04
> 1                       1.07
> 2                       1.07
> 4                       1.07
> 8                       1.06
> 16                      1.12
> 32                      1.12
> 64                      1.14
> 128                     1.19
> 256                     1.58
> 512                     1.66
> 1024                    1.82
> 2048                    2.17
> 4096                    2.93
> 8192                    4.39
> 16384                   5.52
> 32768                   7.18
> 65536                   9.87
> 131072                 15.25
> 262144                 25.97
> 524288                 47.62
> 1048576                91.80
> 2097152               177.88
> 4194304               350.79
> 
> 
> > Best,
> > Hari.
> > 
> > -----Original Message-----
> > From: mvapich-discuss-bounces at cse.ohio-state.edu 
> > <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of 
> > Honggang LI
> > Sent: Friday, April 10, 2020 3:56 AM
> > To: mvapich-discuss at cse.ohio-state.edu 
> > <mvapich-discuss at mailman.cse.ohio-state.edu>
> > Subject: [mvapich-discuss] mvapich2-2.3.3 over connectX-5 regression 
> > issue
> > 
> > hi
> > 
> > short summary:
> > +----------+----------+-----------+
> > |mvapich2  | mpirun   | mpirun_rsh|
> > |version   |          |           |
> > +----------+----------+-----------+
> > |2.3.2     | works    | hang      |
> > +----------+----------+-----------+
> > |2.3.3     | failed   | hang      |
> > +----------+----------+-----------+
> > 
> > Is it possible to run something like 'git bisect' to narrow down the source of regression issue? It seems no git repo available for public.
> > I don't know how to run 'git bisect' with the SVN repo.
> > 
> > thanks
> > 
> > [root at rdma-virt-02 ~]$ cat hfile_one_core
> > 172.31.0.202
> > 172.31.0.203
> > [root at rdma-virt-02 ~]$ ip addr show | grep -w 172.31.0.202
> >     inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic 
> > noprefixroute mlx5_ib0
> > 
> > 
> > [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun  -np 2 -hostfile 
> > /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> > [rdma-virt-02.lab.bos.redhat.com:mpi_rank_0][handle_cqe] Send desc 
> > error in msg to 1, wc_opcode=0 
> > [rdma-virt-02.lab.bos.redhat.com:mpi_rank_0][handle_cqe] Msg from 1: 
> > wc.status=12, wc.wr_id=0x560c8bac9040, wc.opcode=0, 
> > vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND 
> > [rdma-virt-02.lab.bos.redhat.com:mpi_rank_0][handle_cqe] 
> > src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:548: [] Got 
> > completion with error 12, vendor code=0x81, dest rank=1
> > : Protocol not supported (93)
> > [rdma-virt-03.lab.bos.redhat.com:mpi_rank_1][handle_cqe] Send desc 
> > error in msg to 0, wc_opcode=0 
> > [rdma-virt-03.lab.bos.redhat.com:mpi_rank_1][handle_cqe] Msg from 0: 
> > wc.status=12, wc.wr_id=0x563896cf9040, wc.opcode=0, 
> > vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND 
> > [rdma-virt-03.lab.bos.redhat.com:mpi_rank_1][handle_cqe] 
> > src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:548: [] Got 
> > completion with error 12, vendor code=0x81, dest rank=0
> > : Protocol not supported (93)
> > 
> > [root at rdma-virt-02 ~]$ dnf downgrade mvapich2 Updating Subscription Management repositories.
> > Unable to read consumer identity
> > This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
> > Last metadata expiration check: 2:24:17 ago on Fri 10 Apr 2020 01:18:01 AM EDT.
> > Dependencies resolved.
> > ========================================================================================================================================
> >  Package                       Architecture                Version                          Repository                             Size
> > ======================================================================
> > ==================================================================
> > Downgrading:
> >  mvapich2                      x86_64                      2.3.2-2.el8                      beaker-AppStream                      3.1 M
> > 
> > Transaction Summary
> > ======================================================================
> > ==================================================================
> > Downgrade  1 Package
> > 
> > Total download size: 3.1 M
> > Is this ok [y/N]: y
> > Downloading Packages:
> > mvapich2-2.3.2-2.el8.x86_64.rpm                                                                          39 MB/s | 3.1 MB     00:00
> > ----------------------------------------------------------------------------------------------------------------------------------------
> > Total                                                                                                    39 MB/s | 3.1 MB     00:00
> > Running transaction check
> > Transaction check succeeded.
> > Running transaction test
> > Transaction test succeeded.
> > Running transaction
> >   Preparing        :                                                                                                                1/1
> >   Downgrading      : mvapich2-2.3.2-2.el8.x86_64                                                                                    1/2
> >   Cleanup          : mvapich2-2.3.3-1.el8.x86_64                                                                                    2/2
> >   Running scriptlet: mvapich2-2.3.3-1.el8.x86_64                                                                                    2/2
> >   Verifying        : mvapich2-2.3.2-2.el8.x86_64                                                                                    1/2
> >   Verifying        : mvapich2-2.3.3-1.el8.x86_64                                                                                    2/2
> > Installed products updated.
> > 
> > Downgraded:
> >   mvapich2-2.3.2-2.el8.x86_64
> > 
> > Complete!
> > [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun  -np 2 -hostfile 
> > /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> > # OSU MPI Latency Test v5.4.1
> > # Size          Latency (us)
> > 0                       1.24
> > 1                       1.29
> > 2                       1.29
> > 4                       1.29
> > 8                       1.29
> > 16                      1.34
> > 32                      1.35
> > 64                      1.36
> > 128                     1.42
> > 256                     1.82
> > 512                     1.92
> > 1024                    2.11
> > 2048                    2.53
> > 4096                    3.48
> > 8192                    5.19
> > 16384                   7.37
> > 32768                  10.12
> > 65536                  15.00
> > 131072                 24.69
> > 262144                 44.15
> > 524288                 82.97
> > 1048576               160.92
> > 2097152               316.19
> > 4194304               626.91
> > [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun_rsh  -np 2 
> > -hostfile /root/hfile_one_core 
> > /usr/lib64/mvapich2/bin/mpitests-osu_latency
> > 
> > (hang on, no output)
> > 
> > [root at rdma-virt-03 ~]$ ibstat
> > CA 'mlx5_bond_0'
> > CA type: MT4117
> > Number of ports: 1
> > Firmware version: 14.25.1020
> > Hardware version: 0
> > Node GUID: 0xe41d2d0300fda736
> > System image GUID: 0xe41d2d0300fda736
> > Port 1:
> > State: Active
> > Physical state: LinkUp
> > Rate: 25
> > Base lid: 0
> > LMC: 0
> > SM lid: 0
> > Capability mask: 0x00010000
> > Port GUID: 0xe61d2dfffefda736
> > Link layer: Ethernet
> > CA 'mlx5_1'
> > CA type: MT4115
> > Number of ports: 1
> > Firmware version: 12.25.1020
> > Hardware version: 0
> > Node GUID: 0xe41d2d0300e70e87
> > System image GUID: 0xe41d2d0300e70e86
> > Port 1:
> > State: Active
> > Physical state: LinkUp
> > Rate: 100
> > Base lid: 30
> > LMC: 0
> > SM lid: 1
> > Capability mask: 0x2659e848
> > Port GUID: 0xe41d2d0300e70e87
> > Link layer: InfiniBand
> > CA 'mlx5_0'
> > CA type: MT4115
> > Number of ports: 1
> > Firmware version: 12.25.1020
> > Hardware version: 0
> > Node GUID: 0xe41d2d0300e70e86
> > System image GUID: 0xe41d2d0300e70e86
> > Port 1:
> > State: Active
> > Physical state: LinkUp
> > Rate: 100
> > Base lid: 20
> > LMC: 0
> > SM lid: 13
> > Capability mask: 0x2659e848
> > Port GUID: 0xe41d2d0300e70e86
> > Link layer: InfiniBand
> > 
> > 
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > 
> 




More information about the mvapich-discuss mailing list