[Mvapich-discuss] RDMA CM messages
Subramoni, Hari
subramoni.1 at osu.edu
Wed Jul 28 07:46:42 EDT 2021
Hi, Lana.
It looks like IP addresses were not assigned to all the IB ports.
As a workaround, can you please set MV2_USE_RDMA_CM=0 and try?
Thx,
Hari.
PS: Please try and move to MVAPICH2 2.3.6. It has a lot of fixes and performance enhancements compared to the 2.3.5 release.
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> On Behalf Of Lana Deere via Mvapich-discuss
Sent: Tuesday, July 27, 2021 6:24 PM
To: mvapich-discuss at lists.osu.edu
Subject: [Mvapich-discuss] RDMA CM messages
I'm using mvapich2.3.5 on CentOS 7.
I've got an MPI job which is failing intermittently. One of the failure symptoms is a hang in MPI_InitThread, with this traceback:
/lib64/libpthread.so.0 read
libmpi.so.12 PMIU_readline
libmpi.so.12
libmpi.so.12 UPMI_BARRIER
libmpi.so.12 rdma_cm_exchange_hostid
libmpi.so.12 MPIDI_CH3I_RDMA_CM_Init
libmpi.so.12 MPIDI_CH3_Init
libmpi.so.12 MPID_Init
libmpi.so.12 MPIR_Init_thread
libmpi.so.12 MPI_Init_thread
A run which didn't fail produced this warning:
Warning: RDMA CM Initialization failed. Continuing without RDMA CM support. Please set MV2_USE_RDMA_CM=0 to disable RDMA CM.
Does anyone have advice on tracking this down? Does it suggest a software issue? An infiniband hardware issue?
Thanks.
.. Lana (lana.deere at gmail.com<mailto:lana.deere at gmail.com>)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20210728/cdfd718f/attachment-0022.html>
More information about the Mvapich-discuss
mailing list