[mvapich-discuss] viarecv.c:613: viadev_eager_pull: Assertion `rhandle->vbuf_head != ((void *)0)' failed.

Adam Moody moody20 at llnl.gov
Tue May 1 14:14:02 EDT 2007


Yes, thanks.  I haven't yet reproduced it in 0.9.9.  I suppose we could 
try a pure OSU 0.9.7, but I don't think I have a copy around.  Maybe I 
could pull one off of a svn branch or off the trunk given a revision number.
-Adam

Pavel Shamis (Pasha) wrote:

> Hi All,
> viarecv.c:613 it is code from mvapich-0.9.7-mlx2.2.0
> The same code exists in 0.9.9  -viarecv.c line 555
> I will try to analyze the issue in mlx2.2.0
>
> Regards,
> Pasha
>
> Matthew Koop wrote:
>
>> Adam,
>>
>> Based on your line number, it appears that this is the 0.9.7-mlx that
>> shipped as a part of OFED 1.1. Is this correct? If so, it will be very
>> hard for us to determine if the issue is still there since it is a
>> different codebase than the 0.9.7 shipped from OSU.
>>
>> If you are able to reproduce in any mode on 0.9.9 please let us know and
>> we'll be very interested to investigate further.
>>
>> Thanks,
>> Matt
>>
>> On Mon, 30 Apr 2007, Adam Moody wrote:
>>
>>  
>>
>>> Hello all,
>>> One user's code will sometimes die with MVAPICH-1 0.9.7.  One a given
>>> run, it will randomly lead to one of three outcomes:
>>>     #1)  viarecv.c:613: viadev_eager_pull: Assertion 
>>> `rhandle->vbuf_head
>>> != ((void *)0)' failed.
>>>     #2)  MPI_IRECV : Invalid count argument
>>>     #3)  the code runs without error
>>>  From what I can tell, in case #1, the message that leads to the
>>> assertion failure is an unexpected eager message ~1700 bytes from an
>>> off-node task.  The rhandle shows that vbufs_expected=1, but both
>>> vbuf_head and vbuf_tail are NULL.
>>>
>>> So far, this code runs without error in 0.9.9.  I'd like to determine
>>> whether 0.9.9 fixes the problem, or whether it's still out there, but
>>> that the new optimizations in 0.9.9 affect timings in such a way so as
>>> to increase our odds of avoiding it.  Are there any particular fixes in
>>> 0.9.9 which address the race condition described above?
>>> Thanks,
>>> -Adam Moody
>>> DEG/LLNL
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>     
>>
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>   
>
>


More information about the mvapich-discuss mailing list