[mvapich-discuss] error IBV_WC_LOC_LEN_ERR and FATAL event
IBV_EVENT_QP_LAST_WQE_REACHED
Michael Ethier
methier at CGR.Harvard.edu
Mon Jan 7 10:49:14 EST 2008
Hello,
I am new to this forum and hoping someone can help solve the following
problem for me.
We have a modeling application that initializes and runs fine using an
ordinary Ethernet connection.
When we compile using the Infiniband software package (mvapich-0.9.9)
and run, the application fails with the following
at then end:
[0:moorcrofth] Abort: [moorcrofth:0] Got completion with error
IBV_WC_LOC_LEN_ERR, code=1, dest rank=1
at line 388 in file viacheck.c
[0:moorcrofth] Abort: [0] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED,
code=16
at line 2552 in file viacheck.c
mpirun_rsh: Abort signaled from [0 : moorcrofth] remote host is [1 :
moorcroft8 ]
forrtl: error (78): process killed (SIGTERM)
forrtl: error (78): process killed (SIGTERM)
done.
This occurs at the initialization phase it seems when communication
starts between different nodes.
If I set the hostfile to contain the same node so that all the cpus used
are on 1 node, it initializes fine and runs.
We are using Redhat Enterprise 4 Update 5 on x86_64
uname -a
Linux moorcrofth 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007
x86_64 x86_64 x86_64 GNU/Linux
In addition we are using mvapich-0.9.9 for our Infiniband software
package, and Intel 9.1:
[gb16 at moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpicc --version
icc (ICC) 9.1 20070510
Copyright (C) 1985-2007 Intel Corporation. All rights reserved.
[gb16 at moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpif90 --version
ifort (IFORT) 9.1 20070510
Copyright (C) 1985-2007 Intel Corporation. All rights reserved.
We are using the rsh communication protocol for this:
/usr/mpi/intel/mvapich-0.9.9/bin/mpirun_rsh -rsh -np 3 ........
Can anyone suggest how this problem can be solved ?
Thank You in advance,
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080107/df28a63c/attachment.html
More information about the mvapich-discuss
mailing list