[mvapich-discuss] error IBV_WC_LOC_LEN_ERR and FATAL event IBV_EVENT_QP_LAST_WQE_REACHED

Michael Ethier methier at CGR.Harvard.edu
Mon Jan 7 10:49:14 EST 2008


Hello,

 

I am new to this forum and hoping someone can help solve the following
problem for me.

 

We have a modeling application that initializes and runs fine using an
ordinary Ethernet connection.

 

When we compile using the Infiniband software package (mvapich-0.9.9)
and run, the application fails with the following

at then end:

 

[0:moorcrofth] Abort: [moorcrofth:0] Got completion with error
IBV_WC_LOC_LEN_ERR, code=1, dest rank=1

 at line 388 in file viacheck.c

[0:moorcrofth] Abort: [0] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED,
code=16

 at line 2552 in file viacheck.c

mpirun_rsh: Abort signaled from [0 : moorcrofth] remote host is [1 :
moorcroft8 ]

forrtl: error (78): process killed (SIGTERM)

forrtl: error (78): process killed (SIGTERM)

done.

 

This occurs at the initialization phase it seems when communication
starts between different nodes.

If I set the hostfile to contain the same node so that all the cpus used
are on 1 node, it initializes fine and runs.

 

We are using Redhat Enterprise 4 Update 5 on x86_64

 

uname -a

Linux moorcrofth 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007
x86_64 x86_64 x86_64 GNU/Linux

 

In addition we are using mvapich-0.9.9 for our Infiniband software
package, and Intel 9.1:

 

[gb16 at moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpicc --version

icc (ICC) 9.1 20070510

Copyright (C) 1985-2007 Intel Corporation.  All rights reserved.

 

[gb16 at moorcrofth 60]$ /usr/mpi/intel/mvapich-0.9.9/bin/mpif90 --version

ifort (IFORT) 9.1 20070510

Copyright (C) 1985-2007 Intel Corporation.  All rights reserved.

 

We are using the rsh communication protocol for this:

/usr/mpi/intel/mvapich-0.9.9/bin/mpirun_rsh -rsh -np 3 ........

 

Can anyone suggest how this problem can be solved ?

 

Thank You in advance,

Mike

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080107/df28a63c/attachment.html


More information about the mvapich-discuss mailing list