[Mvapich-discuss] RDMA CM messages

Subramoni, Hari subramoni.1 at osu.edu
Wed Jul 28 07:46:42 EDT 2021


Hi, Lana.

It looks like IP addresses were not assigned to all the IB ports.

As a workaround, can you please set MV2_USE_RDMA_CM=0 and try?

Thx,
Hari.

PS: Please try and move to MVAPICH2 2.3.6. It has a lot of fixes and performance enhancements compared to the 2.3.5 release.

From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> On Behalf Of Lana Deere via Mvapich-discuss
Sent: Tuesday, July 27, 2021 6:24 PM
To: mvapich-discuss at lists.osu.edu
Subject: [Mvapich-discuss] RDMA CM messages

I'm using mvapich2.3.5 on CentOS 7.

I've got an MPI job which is failing intermittently.  One of the failure symptoms is a hang in MPI_InitThread, with this traceback:
/lib64/libpthread.so.0  read
libmpi.so.12            PMIU_readline
libmpi.so.12
libmpi.so.12            UPMI_BARRIER
libmpi.so.12            rdma_cm_exchange_hostid
libmpi.so.12            MPIDI_CH3I_RDMA_CM_Init
libmpi.so.12            MPIDI_CH3_Init
libmpi.so.12            MPID_Init
libmpi.so.12            MPIR_Init_thread
libmpi.so.12            MPI_Init_thread

A run which didn't fail produced this warning:
Warning: RDMA CM Initialization failed. Continuing without RDMA CM support. Please set MV2_USE_RDMA_CM=0 to disable RDMA CM.

Does anyone have advice on tracking this down?  Does it suggest a software issue?  An infiniband hardware issue?

Thanks.

.. Lana (lana.deere at gmail.com<mailto:lana.deere at gmail.com>)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20210728/cdfd718f/attachment-0022.html>


More information about the Mvapich-discuss mailing list