[mvapich-discuss] FATAL event IBV_EVENT_QP_LAST_WQE_REACHED

Matthew Koop koop at cse.ohio-state.edu
Tue Sep 4 12:28:28 EDT 2007


Nathan,

Thanks. Let us know if the user sees these errors with either of these
settings. Also, if there is any test code you can give us it would be much
appreciated.

The IBV_WC_LOC_LEN_ERR error is more likely to be the real error (it is
not normal), although tracking it without a reproducer or some idea as to
the MPI operations being used will be difficult.

Matt

On Thu, 30 Aug 2007, Nathan Dauchy wrote:

> Matt,
>
> I'll try to get the user who is having this problem to run with those
> environment variables set.
>
> They did report this error as well:
>
> 0:w71] Abort: [w71:0] Got completion with error IBV_WC_LOC_LEN_ERR,
> code=1, dest rank=8 at line 388 in file viacheck.c
>
> But I had assumed that was a "normal" exit condition for a failed run.
> Is this not the case?
>
> Thanks,
> Nathan
>
>
> Matthew Koop wrote:
> > Nathan,
> >
> > Do you ever see any other errors, or is it just
> > IBV_EVENT_QP_LAST_WQE_REACHED? Sometimes when a job fails one process will
> > have an error and then the rest of the processes can exit with another
> > error status unrelated to the problem.
> >
> > Can you try running with VIADEV_USE_SHMEM_COLL=0 and see if that makes any
> > difference?
> >
> > e.g.
> > mpirun_rsh -np 4 h1 h1 h2 h2 VIADEV_USE_SHMEM_COLL=0 ./IMB-MPI1
> >
> > If that doesn't work, you can also try:
> > VIADEV_USE_COALESCE=0
> >
> > These will help us narrow down the problem a bit.
> >
> > Thanks,
> > Matt
> >
> >
> > On Tue, 21 Aug 2007, Nathan Dauchy wrote:
> >
> >> Updated...
> >>
> >> Nathan Dauchy wrote:
> >>> I finally had time to get back to this issue...
> >>>
> >>> The OSU benchmarks run fine.
> >>> The presta benchmarks run fine.
> >>
> >> IMB now runs fine.  I had the library path wrong when compiling.
> >>
> >>>> Sayantan Sur wrote:
> >>>>> Thanks for reporting the problem. The event
> >>>>> IBV_EVENT_QP_LAST_WQE_REACHED means that the QP (internal InfiniBand
> >>>>> communication channel) is in an error state and all requests are
> >>>>> consumed. Could it be related to a setup issue? Can you run any other
> >>>>> MPI programs such as OSU benchmarks, IMB etc. on all these nodes?
> >>>>>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list