[Mvapich-discuss] RDMA CM messages

Lana Deere lana.deere at gmail.com
Tue Jul 27 18:24:15 EDT 2021


I'm using mvapich2.3.5 on CentOS 7.

I've got an MPI job which is failing intermittently.  One of the failure
symptoms is a hang in MPI_InitThread, with this traceback:
/lib64/libpthread.so.0  read
libmpi.so.12            PMIU_readline
libmpi.so.12
libmpi.so.12            UPMI_BARRIER
libmpi.so.12            rdma_cm_exchange_hostid
libmpi.so.12            MPIDI_CH3I_RDMA_CM_Init
libmpi.so.12            MPIDI_CH3_Init
libmpi.so.12            MPID_Init
libmpi.so.12            MPIR_Init_thread
libmpi.so.12            MPI_Init_thread

A run which didn't fail produced this warning:
Warning: RDMA CM Initialization failed. Continuing without RDMA CM support.
Please set MV2_USE_RDMA_CM=0 to disable RDMA CM.

Does anyone have advice on tracking this down?  Does it suggest a software
issue?  An infiniband hardware issue?

Thanks.

.. Lana (lana.deere at gmail.com)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20210727/d1a84feb/attachment-0021.html>


More information about the Mvapich-discuss mailing list