[Mvapich-discuss] RDMA CM messages
Lana Deere
lana.deere at gmail.com
Tue Jul 27 18:24:15 EDT 2021
I'm using mvapich2.3.5 on CentOS 7.
I've got an MPI job which is failing intermittently. One of the failure
symptoms is a hang in MPI_InitThread, with this traceback:
/lib64/libpthread.so.0 read
libmpi.so.12 PMIU_readline
libmpi.so.12
libmpi.so.12 UPMI_BARRIER
libmpi.so.12 rdma_cm_exchange_hostid
libmpi.so.12 MPIDI_CH3I_RDMA_CM_Init
libmpi.so.12 MPIDI_CH3_Init
libmpi.so.12 MPID_Init
libmpi.so.12 MPIR_Init_thread
libmpi.so.12 MPI_Init_thread
A run which didn't fail produced this warning:
Warning: RDMA CM Initialization failed. Continuing without RDMA CM support.
Please set MV2_USE_RDMA_CM=0 to disable RDMA CM.
Does anyone have advice on tracking this down? Does it suggest a software
issue? An infiniband hardware issue?
Thanks.
.. Lana (lana.deere at gmail.com)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20210727/d1a84feb/attachment-0021.html>
More information about the Mvapich-discuss
mailing list