[mvapich-discuss] IBV_EVENT_QP_LAST_WQE_REACHED error
Steve Jones
stevejones at stanford.edu
Tue Oct 2 12:53:55 EDT 2007
> Can you do these couple of checks?
> 1. Making sure that the IB installation is the same on both the nodes.
> 2. Using mpirun_rsh instead of mpiexec.
> 3. Disabling shared memory collectives by using the environment variable
> VIADEV_USE_SHMEM_COLL=0,
Hi Amith.
I checked the installation on nodes, switched to mpirun_rsh, and used
the environment variable. I also ran ldd to verify libs that are being
called within the session that's failing. No changes.
Let me know what else I can do to debug.
Thanks.
Steve
[smjones at compute-5-0 Test_for_Steve]$ env |grep VIA
VIADEV_USE_SHMEM_COLL=0
[smjones at compute-5-0 Test_for_Steve]$
/share/apps/mvapich/intel/bin/mpirun_rsh -ssh -np 16 -hostfile
$PBS_NODEFILE ~/NGA/bin/arts
No input file name was detected, using "input".
Step Time CFLmax Umax Vmax
Wmax Divergence
mpirun_rsh: Abort signaled from [0]
[0:compute-5-0.local] Abort: [0] Got FATAL event
IBV_EVENT_QP_LAST_WQE_REACHED, code=16
at line 2551 in file viacheck.c
done.
[smjones at compute-5-0 Test_for_Steve]$
/share/apps/mvapich/intel/bin/mpirun_rsh -np 16 -hostfile
$PBS_NODEFILE ~/NGA/bin/arts
No input file name was detected, using "input".
Step Time CFLmax Umax Vmax
Wmax Divergence
[0:compute-5-0.local] Abort: [0] Got FATAL event
IBV_EVENT_QP_LAST_WQE_REACHED, code=16
mpirun_rsh: Abort signaled from [0]
at line 2551 in file viacheck.c
done.
[smjones at compute-5-0 Test_for_Steve]$ mpiexec -npernode 1 ldd ~/NGA/bin/arts
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000002a95573000)
libibumad.so.1 => /usr/lib64/libibumad.so.1 (0x0000002a9567f000)
libibcommon.so.1 => /usr/lib64/libibcommon.so.1 (0x0000002a95789000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003133900000)
librt.so.1 => /lib64/tls/librt.so.1 (0x0000003134300000)
libm.so.6 => /lib64/tls/libm.so.6 (0x0000003133700000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003133200000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003134100000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003133500000)
/lib64/ld-linux-x86-64.so.2 (0x0000003133000000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000002a95573000)
libibumad.so.1 => /usr/lib64/libibumad.so.1 (0x0000002a9567f000)
libibcommon.so.1 => /usr/lib64/libibcommon.so.1 (0x0000002a95789000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003817c00000)
librt.so.1 => /lib64/tls/librt.so.1 (0x0000003818200000)
libm.so.6 => /lib64/tls/libm.so.6 (0x0000003817a00000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003817500000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003818800000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003817800000)
/lib64/ld-linux-x86-64.so.2 (0x0000003817300000)
More information about the mvapich-discuss
mailing list