[mvapich-discuss] mvapich2.3.4 mcast error

Gong-Do Hwang grover.hwang at gmail.com
Thu Aug 20 05:25:04 EDT 2020


Hi,

I was using mvapich2.3.4 to run WRF 4.1.5 over MLNX OFED 5.0. And I found
it when using node larger than 8 I had error message below when the WRF
integration started:

[cn01:mpi_rank_0][mv2_mcast_resend_window] Failed to post mcast send
errno:12
[cn01:mpi_rank_0][mv2_mcast_resend_window] Failed to post mcast send
errno:12
[cn01:mpi_rank_0][mv2_mcast_resend_window] Failed to post mcast send
errno:12

mvapich2 was configured with the following flags:
./configure --prefix=$prefix --with-device=ch3:mrail  --with-rdma=gen2
--enable-threads=multiple  --enable-rdma-cm --enable-threads=multiple
--enable-romio --with-ch3-rank-bits=32  --with-ib-include=/usr/include
--with-ib-libpath=/usr/lib64 --with-ibverbs-include=/usr/include
--with-ibverbs-lib=/usr/lib64 CC=icc CFLAGS="-fPIC" F77=ifort
FFLAGS="-fPIC" FC=ifort FCFLAGS="-fPIC" CXX=icpc CXXFLAGS="-fPIC"

and run script and ARGS:

export MV2_ENABLE_AFFINITY=0
export MV2_IBA_HCA=mlx5_0
mpiexec -rmk pbs  -np ${NPROCS}   ./wrf.exe

I assigned the HCA because we have another mlx ehternet hca on the compute
node.

Is there any place I can find what error no =12 means? And is there any
workaround for this?
Thanks so much for your help!

Grover
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200820/fdf09584/attachment.html>


More information about the mvapich-discuss mailing list