[mvapich-discuss] IBV_EVENT_QP_LAST_WQE_REACHED error

Steve Jones stevejones at stanford.edu
Tue Oct 2 12:53:55 EDT 2007


> Can you do these couple of checks?
> 1. Making sure that the IB installation is the same on both the nodes.
> 2. Using mpirun_rsh instead of mpiexec.
> 3. Disabling shared memory collectives by using the environment variable
>   VIADEV_USE_SHMEM_COLL=0,

Hi Amith.

I checked the installation on nodes, switched to mpirun_rsh, and used  
the environment variable. I also ran ldd to verify libs that are being  
called within the session that's failing. No changes.

Let me know what else I can do to debug.

Thanks.

Steve

[smjones at compute-5-0 Test_for_Steve]$ env |grep VIA
VIADEV_USE_SHMEM_COLL=0

[smjones at compute-5-0 Test_for_Steve]$  
/share/apps/mvapich/intel/bin/mpirun_rsh -ssh -np 16 -hostfile  
$PBS_NODEFILE ~/NGA/bin/arts
  No input file name was detected, using "input".
         Step        Time        CFLmax      Umax        Vmax         
Wmax    Divergence
mpirun_rsh: Abort signaled from [0]
[0:compute-5-0.local] Abort: [0] Got FATAL event  
IBV_EVENT_QP_LAST_WQE_REACHED, code=16
  at line 2551 in file viacheck.c
done.

[smjones at compute-5-0 Test_for_Steve]$  
/share/apps/mvapich/intel/bin/mpirun_rsh -np 16 -hostfile  
$PBS_NODEFILE ~/NGA/bin/arts
  No input file name was detected, using "input".
         Step        Time        CFLmax      Umax        Vmax         
Wmax    Divergence
[0:compute-5-0.local] Abort: [0] Got FATAL event  
IBV_EVENT_QP_LAST_WQE_REACHED, code=16
mpirun_rsh: Abort signaled from [0]
  at line 2551 in file viacheck.c
done.

[smjones at compute-5-0 Test_for_Steve]$ mpiexec -npernode 1 ldd ~/NGA/bin/arts
         libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000002a95573000)
         libibumad.so.1 => /usr/lib64/libibumad.so.1 (0x0000002a9567f000)
         libibcommon.so.1 => /usr/lib64/libibcommon.so.1 (0x0000002a95789000)
         libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003133900000)
         librt.so.1 => /lib64/tls/librt.so.1 (0x0000003134300000)
         libm.so.6 => /lib64/tls/libm.so.6 (0x0000003133700000)
         libc.so.6 => /lib64/tls/libc.so.6 (0x0000003133200000)
         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003134100000)
         libdl.so.2 => /lib64/libdl.so.2 (0x0000003133500000)
         /lib64/ld-linux-x86-64.so.2 (0x0000003133000000)
         libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000002a95573000)
         libibumad.so.1 => /usr/lib64/libibumad.so.1 (0x0000002a9567f000)
         libibcommon.so.1 => /usr/lib64/libibcommon.so.1 (0x0000002a95789000)
         libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003817c00000)
         librt.so.1 => /lib64/tls/librt.so.1 (0x0000003818200000)
         libm.so.6 => /lib64/tls/libm.so.6 (0x0000003817a00000)
         libc.so.6 => /lib64/tls/libc.so.6 (0x0000003817500000)
         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003818800000)
         libdl.so.2 => /lib64/libdl.so.2 (0x0000003817800000)
         /lib64/ld-linux-x86-64.so.2 (0x0000003817300000)



More information about the mvapich-discuss mailing list