[mvapich-discuss] FATAL event IBV_EVENT_QP_LAST_WQE_REACHED

Nathan Dauchy Nathan.Dauchy at noaa.gov
Sun Aug 5 09:53:23 EDT 2007


Sayantan,

Thanks for the quick reply!

We were previously running openIB gen2 (r-7.6.07 I think) and MVAPICH
0.9.8.  In that environment, several benchmarks were run, as well as
many user codes, and I'm not aware of any problems related to
QP_LAST_WQE_REACHED.

Since upgrading, we have not yet run many benchmarks.  I'm out of the
office for the next week but will see about running the ones you suggest
when I get back.

Thanks,
Nathan


Sayantan Sur wrote:
> Hi Nathan,
> 
> Nathan Dauchy wrote:
>> Pierrick, All,
>>
>> We recently upgraded to OFED-1.2 (mvapich-0.9.9) and are now getting an
>> error that looks similar to yours:
>>
>> [0:w72] Abort: [0] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED,
>> code=16 at line 2552 in file viacheck.c
>>
>> Did you ever find a solution?  (I can't find one in the archives.)
>>
>> Can someone explain what the IBV_EVENT_QP_LAST_WQE_REACHED error means?
>>   I can't find any clues in the source, and have been unable to turn up
>> any relevant docs either.
>>
>> Thanks for any help and clues you can offer!
>>   
> 
> 
> Thanks for reporting the problem. The event
> IBV_EVENT_QP_LAST_WQE_REACHED means that the QP (internal InfiniBand
> communication channel) is in an error state and all requests are
> consumed. Could it be related to a setup issue? Can you run any other
> MPI programs such as OSU benchmarks, IMB etc. on all these nodes?
> 
> Thanks,
> Sayantan.
> 
>> Regards,
>> Nathan
>>
>>
>> Pierrick Penven, Tue Mar 27 12:33:03 EDT 2007:
>>  
>>> Dear all,
>>>
>>> I am trying to install an ocean model on a  cluster based on 64 bit
>>> bi-dual core AMD opterons with infiniband using pathf90 and mvapich
>>> v0.9.8.
>>>
>>> The model is runs and scales well on 1 node, but is not able to run
>>> on several nodes.  I have tried to used mvapich v0.9.9.beta, and I
>>> get the following message:
>>>
>>> [1:chpcc060] Abort: [1:chpcc060] Abort: [chpcc060:1] Got completion
>>> with error IBV_WC_LOC_PROT_ERR, code=4 at line 2374 in file viacheck.c
>>> [1] Got FATAL event IBV_EVENT_QP_LAST_WQE_REACHED, code=16 at line
>>> 2554 in file viacheck.c
>>> [0:chpcc058] Abort: [0] Got FATAL event
>>> IBV_EVENT_QP_LAST_WQE_REACHED, code=16
>>>  at line 2554 in file viacheck.c
>>> /CHPC/usr/local/mvapich_099b/bin/mpirun: line 1: 17412
>>> Terminated              /CHPC/usr/local/mvapich_099b/bin/mpirun_rsh
>>> -np 2 -hostfile /CHPC/home/loadl/execute/chpcln.4269.0.machinefile
>>> /CHPC/home/ppenven/Roms_tools/TEST1/./roms
>>>
>>> The problem does not occur using the MPI over IP rather than VAPI.
>>>
>>> Is there a solution to this problem ?
>>>
>>> Thanks a lot
>>>
>>> Pierrick
>>>     
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>   
> 
> 



More information about the mvapich-discuss mailing list