[mvapich-discuss] Assertion failure

Martin Pokorny mpokorny at nrao.edu
Tue Jan 22 12:08:56 EST 2013


On 01/22/2013 09:44 AM, Martin Pokorny wrote:
> On 01/16/2013 07:54 AM, Devendar Bureddy wrote:
>> It seems the assertion is hitting during the large message transfer
>> with non default RNDV protocol i.e R3 ( default is RPUT). The large
>> message transfer should have switched to R3 protocol only if the IB
>> memory registration is failed internally.
>>
>> Can you try with forcing all the large message transfers to R3
>> protocol using run-time parameter MV2_RNDV_PROTOCOL=R3 and see if
>> this increases the frequency of failure.
>
> Good suggestion! When I set MV2_RNDV_PROTOCOL=R3, the failure occurs in
> every trial, immediately upon opening a file. From the backtrace I see
> that none of the modified ADIO Lustre code has been called, so I'm
> tempted to eliminate my modifications as a possible cause of the error
> (although I'm willing to do more to confirm that.)

A brief followup: I just restored the original ADIO Lustre code, and the 
failure still occurs.

-- 
Martin Pokorny
Software Engineer - Karl G. Jansky Very Large Array
National Radio Astronomy Observatory - New Mexico Operations


More information about the mvapich-discuss mailing list