[Mvapich-discuss] MVAPICH2-2.3.7pre-Rockportqos: Setting the MV2_DEFAULT_MTU cause MPICM_Init_UD_CM to fail

Nicolas Gagnon ngagnon at rockportnetworks.com
Fri Oct 8 10:22:58 EDT 2021


The problem starts when testing the system and configuring cluster Ethernet MTU to 1500, we start getting the MPICM_Init_UD_CM error as seen below.  I did revert the cluster MTU back to 9000 and confirm everything is working just fine as before. Then I did try controlling the default MTU (as stated in the message below) to 4000 and expecting the effective RDMA MTU been 2048 but I still get this error even if the specified MTU is way above the limit. It seems we do specify the default MTU it will not work properly.

Is my understanding of the “MV2_DEFAULT_MTU” correct as this will dictate the maximum System MTU and the rdma_default_mtu should be the maximum RDMA MTU fitting within the default MTU.


[user at dell-s13-h1 ~]$ export MV2_DEFAULT_MTU=4000
[user at dell-s13-h1 ~]$ /opt/bm/hpc/mvapich2-2.3.7pre-rockportqos/bin/mpiexec -np 1504 -f /rockshare/ngagnon/dummy/intel-mpi-2021-09-30-1635/i_1024/rpn_32/syseng_48/mvapich-host.cfg -env MV2_HOMOGENEOUS_CLUSTER=1 -env MV2_HYBRID_ENABLE_THRESHOLD=102400 -env MV2_NDREG_ENTRIES_MAX=100000 -env MV2_NDREG_ENTRIES=50000 /opt/bm/hpc/mvapich2-2.3.7pre-rockportqos/libexec/osu-micro-benchmarks/mpi/collective/osu_alltoall -f -i 100 -m:4
[dell-s13-h2:mpi_rank_0][MPICM_Init_UD_CM] sizeof cm_msg (1296) >= rdma_default_mtu (1024).
[dell-s13-h2:mpi_rank_0][MPICM_Init_UD_CM] Try increasing the MV2_DEFAULT_MTU or reduce MAX_NUM_HCAS, or MAX_NUM_QP_PER_PORT in ibv_param.h.
[cli_0]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(493)...:
MPID_Init(419)..........: channel initialization failed
MPIDI_CH3_Init(581).....:
MPIDI_CH3I_CM_Init(2054):
MPICM_Init_UD_CM(2092)..:
(unknown)(): Other MPI error

Nicolas Gagnon
Principal Designer/Architect, Engineering
ngagnon at rockportnetworks.com<mailto:ngagnon at rockportnetworks.com>
Rockport | Simplify the Network

[signature_1389006283]<https://urldefense.com/v3/__https://rockportnetworks.com/__;!!KGKeukY!kkq30nk5jPD0xGrOHiBz5eNLHzw1gY3tmocSpShpof2vjrMd6nC1lWhO4Tk2Pi-0rtaZTlJLEw$ >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20211008/7a2fa968/attachment-0021.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 6092 bytes
Desc: image001.png
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20211008/7a2fa968/attachment-0021.png>


More information about the Mvapich-discuss mailing list