[mvapich-discuss] FATAL event IBV_EVENT_QP_LAST_WQE_REACHED
Nathan Dauchy
Nathan.Dauchy at noaa.gov
Thu Aug 30 18:42:47 EDT 2007
Matt,
I'll try to get the user who is having this problem to run with those
environment variables set.
They did report this error as well:
0:w71] Abort: [w71:0] Got completion with error IBV_WC_LOC_LEN_ERR,
code=1, dest rank=8 at line 388 in file viacheck.c
But I had assumed that was a "normal" exit condition for a failed run.
Is this not the case?
Thanks,
Nathan
Matthew Koop wrote:
> Nathan,
>
> Do you ever see any other errors, or is it just
> IBV_EVENT_QP_LAST_WQE_REACHED? Sometimes when a job fails one process will
> have an error and then the rest of the processes can exit with another
> error status unrelated to the problem.
>
> Can you try running with VIADEV_USE_SHMEM_COLL=0 and see if that makes any
> difference?
>
> e.g.
> mpirun_rsh -np 4 h1 h1 h2 h2 VIADEV_USE_SHMEM_COLL=0 ./IMB-MPI1
>
> If that doesn't work, you can also try:
> VIADEV_USE_COALESCE=0
>
> These will help us narrow down the problem a bit.
>
> Thanks,
> Matt
>
>
> On Tue, 21 Aug 2007, Nathan Dauchy wrote:
>
>> Updated...
>>
>> Nathan Dauchy wrote:
>>> I finally had time to get back to this issue...
>>>
>>> The OSU benchmarks run fine.
>>> The presta benchmarks run fine.
>>
>> IMB now runs fine. I had the library path wrong when compiling.
>>
>>>> Sayantan Sur wrote:
>>>>> Thanks for reporting the problem. The event
>>>>> IBV_EVENT_QP_LAST_WQE_REACHED means that the QP (internal InfiniBand
>>>>> communication channel) is in an error state and all requests are
>>>>> consumed. Could it be related to a setup issue? Can you run any other
>>>>> MPI programs such as OSU benchmarks, IMB etc. on all these nodes?
>>>>>
More information about the mvapich-discuss
mailing list