[mvapich-discuss] Slurm 19.05 and mvapich2-2.3.2: "Warning: RDMA CM Initialization failed. Continuing without RDMA CM support. Please set MV2_USE_RDMA_CM=0 to disable RDMA CM."

Eg. Bo. egonle at aim.com
Sat Oct 19 14:40:40 EDT 2019


Hello *,
I have a Slurm 19.05 and mvapich2-2.3.2 question.
$ sinfo --versionslurm 19.05.2
Mvapich2-2.3.2 was compile using the following options (as given by the SLURM documentation).
$ ./configure --prefix=/opt/mympi232pmi2 --with-slurm=/opt/slurm/prod --with-pmi=pmi2 --with-pm=slurm --enable-slurm=yes

After that mpi-hello-world was compiled using built mpicc.
Unfortunately we keep getting this warning "even" for mpivars and mpi-hello-world:
"Warning: RDMA CM Initialization failed. Continuing without RDMA CM support. Please set MV2_USE_RDMA_CM=0 to disable RDMA CM."My understanding is that RDMA CM is available and should be used.

The IB device installed is:
$ ibv_devinfohca_id: mlx5_0        transport:                      InfiniBand (0)        fw_ver:                         12.21.1000        node_guid:                      <snip>        sys_image_guid:                 <snip>        vendor_id:                      0x02c9        vendor_part_id:                 4115        hw_ver:                         0x0        board_id:                       MT_2180110032        phys_port_cnt:                  1        Device ports:                port:   1                        state:                  PORT_ACTIVE (4)                        max_mtu:                4096 (5)                        active_mtu:             4096 (5)                        sm_lid:                 1                        port_lid:               462                        port_lmc:               0x00                        link_layer:             InfiniBand
$ lsmod |sort|grep -i rdmaib_cm                  51564  3 rdma_cm,ib_ucm,ib_ipoibib_core               280183  10 rdma_cm,ib_cm,iw_cm,mlx4_ib,mlx5_ib,ib_ucm,ib_umad,ib_uverbs,rdma_ucm,ib_ipoibib_uverbs             103082  2 ib_ucm,rdma_ucmiw_cm                  43514  1 rdma_cmmlx_compat             16882  15 rdma_cm,ib_cm,iw_cm,mlx4_en,mlx4_ib,mlx5_ib,mlx5_fpga_tools,ib_ucm,ib_core,ib_umad,ib_uverbs,mlx4_core,mlx5_core,rdma_ucm,ib_ipoibrdma_cm                54720  1 rdma_ucmrdma_ucm               22793  0


My understanding is that it doesn't use the IB network although it's available and configured (e.g. IP address).rdma_server/rdma_client/rping communication tests finished successfully.

Thanks & Best
EgB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20191019/6f5dd4ba/attachment.html>


More information about the mvapich-discuss mailing list