[Mvapich-discuss] MVAPICH 2.3.6 HCA Support for VMXNET3 or PVRDMA VMware Network Adapters

Clark, Nicholas - 1002 - MITLL Nicholas.Clark at ll.mit.edu
Wed Dec 1 16:22:17 EST 2021


Dear Hari,

 

I noticed that state message to and thought it was odd as I was able to do
pings and other network traffic over the interface, but I looked more into
it and did realize that I hadn't fully enabled PVRDMA on the hypervisor. I
went ahead and added a vmkernel and firewall rules per VMware's guidance and
then restarted the interface in the VM and now the state shows PORT_ACTIVE
(4). I also saw more details now in the output:

 

 

hca_id: vmw_pvrdma0

        transport:                      InfiniBand (0)

        fw_ver:                         3.0.000

        node_guid:                      0050:5600:009b:f2fe

        sys_image_guid:                 0000:0000:0000:0000

        vendor_id:                      0x15ad

        vendor_part_id:                 2080

        hw_ver:                         0x1

        board_id:                       1

        phys_port_cnt:                  1

        max_mr_size:                    0xffffffff

        page_size_cap:                  0xc

        max_qp:                         32768

        max_qp_wr:                      1024

        device_cap_flags:               0x00201400

                                        PORT_ACTIVE_EVENT

                                        RC_RNR_NAK_GEN

                                        MEM_MGT_EXTENSIONS

        max_sge:                        16

        max_sge_rd:                     16

        max_cq:                         4096

        max_cqe:                        262144

        max_mr:                         262144

        max_pd:                         4096

        max_qp_rd_atom:                 16

        max_ee_rd_atom:                 0

        max_res_rd_atom:                0

        max_qp_init_rd_atom:            128

       max_ee_init_rd_atom:            0

        atomic_cap:                     ATOMIC_NONE (0)

        max_ee:                         0

        max_rdd:                        0

        max_mw:                         0

        max_raw_ipv6_qp:                0

        max_raw_ethy_qp:                0

        max_mcast_grp:                  0

        max_mcast_qp_attach:            0

        max_total_mcast_qp_attach:      0

        max_ah:                         1048576

        max_fmr:                        0

        max_srq:                        4096

        max_srq_wr:                     1024

        max_srq_sge:                    16

        max_pkeys:                      128

        local_ca_ack_delay:             5

        general_odp_caps:

        rc_odp_caps:

                                        NO SUPPORT

        uc_odp_caps:

                                        NO SUPPORT

        ud_odp_caps:

                                        NO SUPPORT

        xrc_odp_caps:

                                        NO SUPPORT

        completion_timestamp_mask not supported

        core clock not supported

        device_cap_flags_ex:            0x201400

        tso_caps:

                max_tso:                        0

         rss_caps:

                max_rwq_indirection_tables:                     0

                max_rwq_indirection_table_size:                 0

                rx_hash_function:                               0x0

                rx_hash_fields_mask:                            0x0

        max_wq_type_rq:                 0

        packet_pacing_caps:

                qp_rate_limit_min:      0kbps

                qp_rate_limit_max:      0kbps

        tag matching not supported

        num_comp_vectors:               1

                port:   1

                        state:                  PORT_ACTIVE (4)

                        max_mtu:                4096 (5)

                        active_mtu:             4096 (5)

                        sm_lid:                 0

                        port_lid:               0

                        port_lmc:               0x00

                        link_layer:             Ethernet

                        max_msg_sz:             0x7fffffff

                        port_cap_flags:         0x04010000

                        port_cap_flags2:        0x0000

                        max_vl_num:             1 (1)

                        bad_pkey_cntr:          0x0

                        qkey_viol_cntr:         0x0

                        sm_sl:                  0

                        pkey_tbl_len:           1

                        gid_tbl_len:            6

                        subnet_timeout:         0

                        init_type_reply:        0

                        active_width:           1X (1)

                        active_speed:           2.5 Gbps (1)

                        phys_state:             LINK_UP (5)

                        GID[  0]:               fe80::250:56ff:fe9b:f2fe,
RoCE v2

                        GID[  1]:               ::ffff:192.168.0.148, RoCE
v2

 

 

The MVAPICH2 error appears to be the same though:

[host:mpi_rank_0][mv2_get_hca_type]
**********************WARNING***********************

[host:mpi_rank_0][mv2_get_hca_type] Failed to automatically detect the HCA
architecture.

[host:mpi_rank_0][mv2_get_hca_type] This may lead to subpar communication
performance.

[host:mpi_rank_0][mv2_get_hca_type]
****************************************************

[host:mpi_rank_0][rdma_open_hca] Unknown HCA type: this build of MVAPICH2
does notfully support the HCA found on the system (try with other build
options)

[cli_1]: aborting job:

Fatal error in MPI_Init:

Other MPI error, error stack:

MPIR_Init_thread(493)............:

MPID_Init(419)...................: channel initialization failed

MPIDI_CH3_Init(470)..............: rdma_get_control_parameters

rdma_get_control_parameters(1925): rdma_open_hca

rdma_open_hca(1080)..............: Failed to open HCA: No such file or
directory

 

Sincerely,



Nicholas Clark

MIT Lincoln Laboratory 

ISR and Tactical Systems Division

Embedded and Open Systems Group

Systems Administration

244 Wood St., S3-487

Lexington, MA 02421-6426

(O): 781-981-9342

 <mailto:nicholas.clark at ll.mit.edu> nicholas.clark at ll.mit.edu

 

From: Subramoni, Hari <subramoni.1 at osu.edu> 
Sent: Wednesday, December 1, 2021 12:42 PM
To: Clark, Nicholas - 1002 - MITLL <Nicholas.Clark at ll.mit.edu>
Cc: mvapich-discuss at lists.osu.edu; Subramoni, Hari <subramoni.1 at osu.edu>
Subject: RE: [Mvapich-discuss] MVAPICH 2.3.6 HCA Support for VMXNET3 or
PVRDMA VMware Network Adapters

 

Hi, Nicholas.

 

According the output, the port is down. That is probably why MVAPICH2 is
complaining that I cannot find an appropriate HCA. Can you please bring it
up and try again? 

 

state:                  PORT_DOWN

 

Thx,

Hari.

 

From: Clark, Nicholas - 1002 - MITLL <Nicholas.Clark at ll.mit.edu
<mailto:Nicholas.Clark at ll.mit.edu> > 
Sent: Wednesday, December 1, 2021 11:19 AM
To: Subramoni, Hari <subramoni.1 at osu.edu <mailto:subramoni.1 at osu.edu> >
Cc: mvapich-discuss at lists.osu.edu <mailto:mvapich-discuss at lists.osu.edu> 
Subject: RE: [Mvapich-discuss] MVAPICH 2.3.6 HCA Support for VMXNET3 or
PVRDMA VMware Network Adapters

 

Dear Hari,

 

These are the results of ibv_devinfo:

hca_id: vmw_pvrdma0

        transport:                      InfiniBand (0)

        fw_ver:                         3.0.000

        node_guid:                      0050:5600:009b:71e5

        sys_image_guid:                 0000:0000:0000:0000

        vendor_id:                      0x15ad

        vendor_part_id:                 2080

        hw_ver:                         0x1

        board_id:                       1

        phys_port_cnt:                  1

        max_mr_size:                    0xffffffff

        page_size_cap:                  0xc

        max_qp:                         32768

        max_qp_wr:                      1024

        device_cap_flags:               0x00201400

                                        PORT_ACTIVE_EVENT

                                        RC_RNR_NAK_GEN

                                        MEM_MGT_EXTENSIONS

        max_sge:                        16

        max_sge_rd:                     16

        max_cq:                         4096

        max_cqe:                        262144

        max_mr:                         262144

        max_pd:                         4096

        max_qp_rd_atom:                 16

        max_ee_rd_atom:                 0

        max_res_rd_atom:                0

        max_qp_init_rd_atom:            128

        max_ee_init_rd_atom:            0

        atomic_cap:                     ATOMIC_NONE (0)

        max_ee:                         0

        max_rdd:                        0

        max_mw:                         0

        max_raw_ipv6_qp:                0

        max_raw_ethy_qp:                0

        max_mcast_grp:                  0

        max_mcast_qp_attach:            0

        max_total_mcast_qp_attach:      0

        max_ah:                         1048576

        max_fmr:                        0

        max_srq:                        4096

        max_srq_wr:                     1024

        max_srq_sge:                    16

        max_pkeys:                      128

        local_ca_ack_delay:             5

        general_odp_caps:

        rc_odp_caps:

                                        NO SUPPORT

        uc_odp_caps:

                                        NO SUPPORT

        ud_odp_caps:

                                        NO SUPPORT

        xrc_odp_caps:

                                        NO SUPPORT

        completion_timestamp_mask not supported

        core clock not supported

        device_cap_flags_ex:            0x201400

        tso_caps:

                max_tso:                        0

        rss_caps:

                max_rwq_indirection_tables:                     0

                max_rwq_indirection_table_size:                 0

                rx_hash_function:                               0x0

                rx_hash_fields_mask:                            0x0

        max_wq_type_rq:                 0

        packet_pacing_caps:

                qp_rate_limit_min:      0kbps

                qp_rate_limit_max:      0kbps

        tag matching not supported

        num_comp_vectors:               1

                port:   1

                        state:                  PORT_DOWN (1)

                        max_mtu:                4096 (5)

                        active_mtu:             4096 (5)

                        sm_lid:                 0

                        port_lid:               0

                        port_lmc:               0x00

                        link_layer:             Ethernet

                        max_msg_sz:             0x7fffffff

                        port_cap_flags:         0x04010000

                        port_cap_flags2:        0x0000

                        max_vl_num:             1 (1)

                        bad_pkey_cntr:          0x0

                        qkey_viol_cntr:         0x0

                        sm_sl:                  0

                        pkey_tbl_len:           1

                        gid_tbl_len:            6

                        subnet_timeout:         0

                        init_type_reply:        0

                        active_width:           1X (1)

                        active_speed:           2.5 Gbps (1)

                        phys_state:             LINK_UP (5)

                        GID[  0]:               fe80::250:56ff:fe9b:71e5,
RoCE v2

 

 

I am not able to grant remote access to the VM.

 

Sincerely,



Nicholas Clark

MIT Lincoln Laboratory 

ISR and Tactical Systems Division

Embedded and Open Systems Group

Systems Administration

244 Wood St., S3-487

Lexington, MA 02421-6426

(O): 781-981-9342

 <mailto:nicholas.clark at ll.mit.edu> nicholas.clark at ll.mit.edu

 

From: Subramoni, Hari <subramoni.1 at osu.edu <mailto:subramoni.1 at osu.edu> > 
Sent: Wednesday, December 1, 2021 11:12 AM
To: Clark, Nicholas - 1002 - MITLL <Nicholas.Clark at ll.mit.edu
<mailto:Nicholas.Clark at ll.mit.edu> >
Cc: mvapich-discuss at lists.osu.edu <mailto:mvapich-discuss at lists.osu.edu> ;
Subramoni, Hari <subramoni.1 at osu.edu <mailto:subramoni.1 at osu.edu> >
Subject: RE: [Mvapich-discuss] MVAPICH 2.3.6 HCA Support for VMXNET3 or
PVRDMA VMware Network Adapters

 

Hi, Nicholas.

 

MVAPICH2 has support for running in VMs.

 

Could you please send us the output of ibv_devinfo -v on the VM? Do you
think it is possible to get temporary remote access to debug the problem
ourselves?

 

Best,

Hari.

 

From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu
<mailto:mvapich-discuss-bounces at lists.osu.edu> > On Behalf Of Clark,
Nicholas - 1002 - MITLL via Mvapich-discuss
Sent: Tuesday, November 30, 2021 4:49 PM
To: mvapich-discuss at lists.osu.edu <mailto:mvapich-discuss at lists.osu.edu> 
Subject: [Mvapich-discuss] MVAPICH 2.3.6 HCA Support for VMXNET3 or PVRDMA
VMware Network Adapters

 

Does MVPAICH2 have support for running in VMs with VMXNET3 or the PVRDMA
network adapter that supports RoCE v1/v2?

 

Currently with the default build parameters and native rdma-core libraries
on RHEL 8.5, I am seeing this message about unknown HCA on both VMXNET3 and
PVRDMA:

 

[rdma_open_hca] Unknown HCA type: this build of MVAPICH2 does notfully
support the HCA found on the system (try with other build options)

[cli_1]: aborting job:

Fatal error in MPI_Init:

Other MPI error, error stack:

MPIR_Init_thread(493)............:

MPID_Init(419)...................: channel initialization failed

MPIDI_CH3_Init(470)..............: rdma_get_control_parameters

rdma_get_control_parameters(1925): rdma_open_hca

rdma_open_hca(1080)..............: Failed to open HCA: No such file or
directory

 

Sincerely,



Nicholas Clark

MIT Lincoln Laboratory 

ISR and Tactical Systems Division

Embedded and Open Systems Group

Systems Administration

244 Wood St., S3-487

Lexington, MA 02421-6426

(O): 781-981-9342

 <mailto:nicholas.clark at ll.mit.edu> nicholas.clark at ll.mit.edu

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20211201/61da79d3/attachment-0022.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1621 bytes
Desc: not available
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20211201/61da79d3/attachment-0022.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5653 bytes
Desc: not available
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20211201/61da79d3/attachment-0022.p7s>


More information about the Mvapich-discuss mailing list