[mvapich-discuss] viarecv.c:613: viadev_eager_pull: Assertion `rhandle->vbuf_head != ((void *)0)' failed.

Matthew Koop koop at cse.ohio-state.edu
Tue May 1 01:48:15 EDT 2007


Adam,

Based on your line number, it appears that this is the 0.9.7-mlx that
shipped as a part of OFED 1.1. Is this correct? If so, it will be very
hard for us to determine if the issue is still there since it is a
different codebase than the 0.9.7 shipped from OSU.

If you are able to reproduce in any mode on 0.9.9 please let us know and
we'll be very interested to investigate further.

Thanks,
Matt

On Mon, 30 Apr 2007, Adam Moody wrote:

> Hello all,
> One user's code will sometimes die with MVAPICH-1 0.9.7.  One a given
> run, it will randomly lead to one of three outcomes:
>     #1)  viarecv.c:613: viadev_eager_pull: Assertion `rhandle->vbuf_head
> != ((void *)0)' failed.
>     #2)  MPI_IRECV : Invalid count argument
>     #3)  the code runs without error
>  From what I can tell, in case #1, the message that leads to the
> assertion failure is an unexpected eager message ~1700 bytes from an
> off-node task.  The rhandle shows that vbufs_expected=1, but both
> vbuf_head and vbuf_tail are NULL.
>
> So far, this code runs without error in 0.9.9.  I'd like to determine
> whether 0.9.9 fixes the problem, or whether it's still out there, but
> that the new optimizations in 0.9.9 affect timings in such a way so as
> to increase our odds of avoiding it.  Are there any particular fixes in
> 0.9.9 which address the race condition described above?
> Thanks,
> -Adam Moody
> DEG/LLNL
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>





More information about the mvapich-discuss mailing list