[mvapich-discuss] viarecv.c:613: viadev_eager_pull: Assertion
`rhandle->vbuf_head != ((void *)0)' failed.
Adam Moody
moody20 at llnl.gov
Tue May 1 14:14:02 EDT 2007
Yes, thanks. I haven't yet reproduced it in 0.9.9. I suppose we could
try a pure OSU 0.9.7, but I don't think I have a copy around. Maybe I
could pull one off of a svn branch or off the trunk given a revision number.
-Adam
Pavel Shamis (Pasha) wrote:
> Hi All,
> viarecv.c:613 it is code from mvapich-0.9.7-mlx2.2.0
> The same code exists in 0.9.9 -viarecv.c line 555
> I will try to analyze the issue in mlx2.2.0
>
> Regards,
> Pasha
>
> Matthew Koop wrote:
>
>> Adam,
>>
>> Based on your line number, it appears that this is the 0.9.7-mlx that
>> shipped as a part of OFED 1.1. Is this correct? If so, it will be very
>> hard for us to determine if the issue is still there since it is a
>> different codebase than the 0.9.7 shipped from OSU.
>>
>> If you are able to reproduce in any mode on 0.9.9 please let us know and
>> we'll be very interested to investigate further.
>>
>> Thanks,
>> Matt
>>
>> On Mon, 30 Apr 2007, Adam Moody wrote:
>>
>>
>>
>>> Hello all,
>>> One user's code will sometimes die with MVAPICH-1 0.9.7. One a given
>>> run, it will randomly lead to one of three outcomes:
>>> #1) viarecv.c:613: viadev_eager_pull: Assertion
>>> `rhandle->vbuf_head
>>> != ((void *)0)' failed.
>>> #2) MPI_IRECV : Invalid count argument
>>> #3) the code runs without error
>>> From what I can tell, in case #1, the message that leads to the
>>> assertion failure is an unexpected eager message ~1700 bytes from an
>>> off-node task. The rhandle shows that vbufs_expected=1, but both
>>> vbuf_head and vbuf_tail are NULL.
>>>
>>> So far, this code runs without error in 0.9.9. I'd like to determine
>>> whether 0.9.9 fixes the problem, or whether it's still out there, but
>>> that the new optimizations in 0.9.9 affect timings in such a way so as
>>> to increase our odds of avoiding it. Are there any particular fixes in
>>> 0.9.9 which address the race condition described above?
>>> Thanks,
>>> -Adam Moody
>>> DEG/LLNL
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
More information about the mvapich-discuss
mailing list