[mvapich-discuss] FATAL event IBV_EVENT_QP_LAST_WQE_REACHED

Nathan Dauchy Nathan.Dauchy at noaa.gov
Thu Aug 30 18:42:47 EDT 2007


Matt,

I'll try to get the user who is having this problem to run with those
environment variables set.

They did report this error as well:

0:w71] Abort: [w71:0] Got completion with error IBV_WC_LOC_LEN_ERR,
code=1, dest rank=8 at line 388 in file viacheck.c

But I had assumed that was a "normal" exit condition for a failed run.
Is this not the case?

Thanks,
Nathan


Matthew Koop wrote:
> Nathan,
> 
> Do you ever see any other errors, or is it just
> IBV_EVENT_QP_LAST_WQE_REACHED? Sometimes when a job fails one process will
> have an error and then the rest of the processes can exit with another
> error status unrelated to the problem.
> 
> Can you try running with VIADEV_USE_SHMEM_COLL=0 and see if that makes any
> difference?
> 
> e.g.
> mpirun_rsh -np 4 h1 h1 h2 h2 VIADEV_USE_SHMEM_COLL=0 ./IMB-MPI1
> 
> If that doesn't work, you can also try:
> VIADEV_USE_COALESCE=0
> 
> These will help us narrow down the problem a bit.
> 
> Thanks,
> Matt
> 
> 
> On Tue, 21 Aug 2007, Nathan Dauchy wrote:
> 
>> Updated...
>>
>> Nathan Dauchy wrote:
>>> I finally had time to get back to this issue...
>>>
>>> The OSU benchmarks run fine.
>>> The presta benchmarks run fine.
>>
>> IMB now runs fine.  I had the library path wrong when compiling.
>>
>>>> Sayantan Sur wrote:
>>>>> Thanks for reporting the problem. The event
>>>>> IBV_EVENT_QP_LAST_WQE_REACHED means that the QP (internal InfiniBand
>>>>> communication channel) is in an error state and all requests are
>>>>> consumed. Could it be related to a setup issue? Can you run any other
>>>>> MPI programs such as OSU benchmarks, IMB etc. on all these nodes?
>>>>>


More information about the mvapich-discuss mailing list