[mvapich-discuss] mvapich2-2.3.3 over connectX-5 regression issue

Subramoni, Hari subramoni.1 at osu.edu
Fri Apr 10 20:26:52 EDT 2020


Hi, Honggang.

Glad to know that it works for you. I am still trying to understand how changing the launcher changes the IB HCA selection behavior in MVAPICH2. To the best of my knowledge, the two does not have any interaction.

If you don't mind, can you let us know the following

1. output of ibstat on both nodes
2. what do you mean by IPoIB was configured on mlx5_0?

Thx,
Hari.

-----Original Message-----
From: Honggang LI <honli at redhat.com> 
Sent: Friday, April 10, 2020 7:35 PM
To: Subramoni, Hari <subramoni.1 at osu.edu>
Cc: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: Re: [mvapich-discuss] mvapich2-2.3.3 over connectX-5 regression issue

On Fri, Apr 10, 2020 at 01:52:20PM +0000, Subramoni, Hari wrote:
> Hi, Honggang.
> 
> It looks like your systems have multiple network adapters that have been setup with different modes (IB and Ethernet). In such a scenario, I would recommend explicitly setting the network adapter you want MVAPICH2 to use.
> 
> e.g. MV2_IBA_HCA=mlx5_0 or MV2_IBA_HCA=mlx5_1

MV2_IBA_HCA=mlx5_1 works for mpirun and mpirun_rsh. The IPoIB had been configured on mlx5_0. It seems mpirun and mpirun_rsh blindly pick up the first HCA port.

The workaround works, but it is still a regression issue because 2.3.2 does not need the workaround.

Thanks

[root at rdma-virt-02 ~]$ rpm -qf /usr/lib64/mvapich2/bin/mpirun
mvapich2-2.3.3-1.el8.x86_64

[root at rdma-virt-02 ~]$ cat hfile_one_core
172.31.0.202
172.31.0.203

[root at rdma-virt-02 ~]$ ip addr show | grep -w 172.31.0.202
    inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic noprefixroute mlx5_ib0

[root at rdma-virt-02 ~]$ ip addr show mlx5_ib0
8: mlx5_ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:0b:ae:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:e7:0f:f6 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic noprefixroute mlx5_ib0
       valid_lft 2039sec preferred_lft 2039sec
    inet6 fe80::e61d:2d03:e7:ff6/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

[root at rdma-virt-02 ~]$ ibstat mlx5_0
CA 'mlx5_0'
	CA type: MT4115
	Number of ports: 1
	Firmware version: 12.25.1020
	Hardware version: 0
	Node GUID: 0xe41d2d0300e70ff6
	System image GUID: 0xe41d2d0300e70ff6
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 56
		Base lid: 19
		LMC: 0
		SM lid: 13
		Capability mask: 0x2659e848
		Port GUID: 0xe41d2d0300e70ff6  <===
		Link layer: InfiniBand

According to the "link/infiniband" hardware address and port GUID. The HCA is mlx5_0.

[root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -genv MV2_IBA_HCA=mlx5_0 -np 2 -hostfile /root/hfile_one_core hostname rdma-virt-02.lab.bos.redhat.com rdma-virt-03.lab.bos.redhat.com

[root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -genv MV2_IBA_HCA=mlx5_0 -np 2 -hostfile /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
(hang on like mpirun_rsh, no output)


[root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun -genv MV2_IBA_HCA=mlx5_1 -np 2 -hostfile /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
# OSU MPI Latency Test v5.4.1
# Size          Latency (us)
0                       1.03
1                       1.08
2                       1.07
4                       1.07
8                       1.07
16                      1.11
32                      1.11
64                      1.13
128                     1.19
256                     1.59
512                     1.68
1024                    1.84
2048                    2.19
4096                    2.95
8192                    4.40
16384                   5.57
32768                   7.21
65536                   9.89
131072                 15.24
262144                 26.00
524288                 47.65
1048576                91.39
2097152               177.73
4194304               351.36

After add 'export MV2_IBA_HCA=mlx5_1' in the ~/.bashrc on both machine, mpirun_rsh works.

[root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun_rsh  -np 2 -hostfile /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
# OSU MPI Latency Test v5.4.1
# Size          Latency (us)
0                       1.04
1                       1.07
2                       1.07
4                       1.07
8                       1.06
16                      1.12
32                      1.12
64                      1.14
128                     1.19
256                     1.58
512                     1.66
1024                    1.82
2048                    2.17
4096                    2.93
8192                    4.39
16384                   5.52
32768                   7.18
65536                   9.87
131072                 15.25
262144                 25.97
524288                 47.62
1048576                91.80
2097152               177.88
4194304               350.79


> Best,
> Hari.
> 
> -----Original Message-----
> From: mvapich-discuss-bounces at cse.ohio-state.edu 
> <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of 
> Honggang LI
> Sent: Friday, April 10, 2020 3:56 AM
> To: mvapich-discuss at cse.ohio-state.edu 
> <mvapich-discuss at mailman.cse.ohio-state.edu>
> Subject: [mvapich-discuss] mvapich2-2.3.3 over connectX-5 regression 
> issue
> 
> hi
> 
> short summary:
> +----------+----------+-----------+
> |mvapich2  | mpirun   | mpirun_rsh|
> |version   |          |           |
> +----------+----------+-----------+
> |2.3.2     | works    | hang      |
> +----------+----------+-----------+
> |2.3.3     | failed   | hang      |
> +----------+----------+-----------+
> 
> Is it possible to run something like 'git bisect' to narrow down the source of regression issue? It seems no git repo available for public.
> I don't know how to run 'git bisect' with the SVN repo.
> 
> thanks
> 
> [root at rdma-virt-02 ~]$ cat hfile_one_core
> 172.31.0.202
> 172.31.0.203
> [root at rdma-virt-02 ~]$ ip addr show | grep -w 172.31.0.202
>     inet 172.31.0.202/24 brd 172.31.0.255 scope global dynamic 
> noprefixroute mlx5_ib0
> 
> 
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun  -np 2 -hostfile 
> /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> [rdma-virt-02.lab.bos.redhat.com:mpi_rank_0][handle_cqe] Send desc 
> error in msg to 1, wc_opcode=0 
> [rdma-virt-02.lab.bos.redhat.com:mpi_rank_0][handle_cqe] Msg from 1: 
> wc.status=12, wc.wr_id=0x560c8bac9040, wc.opcode=0, 
> vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND 
> [rdma-virt-02.lab.bos.redhat.com:mpi_rank_0][handle_cqe] 
> src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:548: [] Got 
> completion with error 12, vendor code=0x81, dest rank=1
> : Protocol not supported (93)
> [rdma-virt-03.lab.bos.redhat.com:mpi_rank_1][handle_cqe] Send desc 
> error in msg to 0, wc_opcode=0 
> [rdma-virt-03.lab.bos.redhat.com:mpi_rank_1][handle_cqe] Msg from 0: 
> wc.status=12, wc.wr_id=0x563896cf9040, wc.opcode=0, 
> vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND 
> [rdma-virt-03.lab.bos.redhat.com:mpi_rank_1][handle_cqe] 
> src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:548: [] Got 
> completion with error 12, vendor code=0x81, dest rank=0
> : Protocol not supported (93)
> 
> [root at rdma-virt-02 ~]$ dnf downgrade mvapich2 Updating Subscription Management repositories.
> Unable to read consumer identity
> This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
> Last metadata expiration check: 2:24:17 ago on Fri 10 Apr 2020 01:18:01 AM EDT.
> Dependencies resolved.
> ========================================================================================================================================
>  Package                       Architecture                Version                          Repository                             Size
> ======================================================================
> ==================================================================
> Downgrading:
>  mvapich2                      x86_64                      2.3.2-2.el8                      beaker-AppStream                      3.1 M
> 
> Transaction Summary
> ======================================================================
> ==================================================================
> Downgrade  1 Package
> 
> Total download size: 3.1 M
> Is this ok [y/N]: y
> Downloading Packages:
> mvapich2-2.3.2-2.el8.x86_64.rpm                                                                          39 MB/s | 3.1 MB     00:00
> ----------------------------------------------------------------------------------------------------------------------------------------
> Total                                                                                                    39 MB/s | 3.1 MB     00:00
> Running transaction check
> Transaction check succeeded.
> Running transaction test
> Transaction test succeeded.
> Running transaction
>   Preparing        :                                                                                                                1/1
>   Downgrading      : mvapich2-2.3.2-2.el8.x86_64                                                                                    1/2
>   Cleanup          : mvapich2-2.3.3-1.el8.x86_64                                                                                    2/2
>   Running scriptlet: mvapich2-2.3.3-1.el8.x86_64                                                                                    2/2
>   Verifying        : mvapich2-2.3.2-2.el8.x86_64                                                                                    1/2
>   Verifying        : mvapich2-2.3.3-1.el8.x86_64                                                                                    2/2
> Installed products updated.
> 
> Downgraded:
>   mvapich2-2.3.2-2.el8.x86_64
> 
> Complete!
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun  -np 2 -hostfile 
> /root/hfile_one_core /usr/lib64/mvapich2/bin/mpitests-osu_latency
> # OSU MPI Latency Test v5.4.1
> # Size          Latency (us)
> 0                       1.24
> 1                       1.29
> 2                       1.29
> 4                       1.29
> 8                       1.29
> 16                      1.34
> 32                      1.35
> 64                      1.36
> 128                     1.42
> 256                     1.82
> 512                     1.92
> 1024                    2.11
> 2048                    2.53
> 4096                    3.48
> 8192                    5.19
> 16384                   7.37
> 32768                  10.12
> 65536                  15.00
> 131072                 24.69
> 262144                 44.15
> 524288                 82.97
> 1048576               160.92
> 2097152               316.19
> 4194304               626.91
> [root at rdma-virt-02 ~]$ /usr/lib64/mvapich2/bin/mpirun_rsh  -np 2 
> -hostfile /root/hfile_one_core 
> /usr/lib64/mvapich2/bin/mpitests-osu_latency
> 
> (hang on, no output)
> 
> [root at rdma-virt-03 ~]$ ibstat
> CA 'mlx5_bond_0'
> CA type: MT4117
> Number of ports: 1
> Firmware version: 14.25.1020
> Hardware version: 0
> Node GUID: 0xe41d2d0300fda736
> System image GUID: 0xe41d2d0300fda736
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 25
> Base lid: 0
> LMC: 0
> SM lid: 0
> Capability mask: 0x00010000
> Port GUID: 0xe61d2dfffefda736
> Link layer: Ethernet
> CA 'mlx5_1'
> CA type: MT4115
> Number of ports: 1
> Firmware version: 12.25.1020
> Hardware version: 0
> Node GUID: 0xe41d2d0300e70e87
> System image GUID: 0xe41d2d0300e70e86
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 100
> Base lid: 30
> LMC: 0
> SM lid: 1
> Capability mask: 0x2659e848
> Port GUID: 0xe41d2d0300e70e87
> Link layer: InfiniBand
> CA 'mlx5_0'
> CA type: MT4115
> Number of ports: 1
> Firmware version: 12.25.1020
> Hardware version: 0
> Node GUID: 0xe41d2d0300e70e86
> System image GUID: 0xe41d2d0300e70e86
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 100
> Base lid: 20
> LMC: 0
> SM lid: 13
> Capability mask: 0x2659e848
> Port GUID: 0xe41d2d0300e70e86
> Link layer: InfiniBand
> 
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 




More information about the mvapich-discuss mailing list