[mvapich-discuss] Memory error detected by TotalView and Valgrind in MV2-2.1

Adam T. Moody moody20 at llnl.gov
Tue Feb 9 14:08:53 EST 2016


Thanks, Hari.  I also didn't find definitive PMI documentation about the 
semantics in this case.

For now, I'll patch our MV2 to just allocate a byte for 
MPIDI_failed_procs_string.
-Adam


Hari Subramoni wrote:

>Hi Adam,
>
>Thanks for pointing this out.
>
>To the best of our knowledge there is no official standards document for
>PMI and our understanding is that the documentation included with pmi.h
>file that ships with of SLURM and MPICH does not mention anything about it.
>Please correct me if I'm wrong.
>
>However, what you pointed out does look like an issue and we will
>definitely take care of it in the next release. In our implementation of
>PMI, the max_val_len is initialized to 0. That could be why we never saw it
>in our internal testing before.
>
>Regards,
>Hari.
>
>On Mon, Feb 8, 2016 at 9:13 PM, Adam T. Moody <moody20 at llnl.gov> wrote:
>
>  
>
>>I should add that our custom PMI library bails out with an error if you
>>call PMI_KVS_Get_value_length_max() before calling PMI_Init().  So
>>PMI_KVS_Get_value_length_max() returns an error (PMI_ERR_INIT) and it
>>does not modify the output parameter val.  The malloc then fails because
>>val is not initialized, and it can take on random values.
>>
>>I see that the MV2 code does not check the PMI return code here.
>>-Adam
>>
>>
>>Adam T. Moody wrote:
>>
>>    
>>
>>>Hello MVAPICH team,
>>>We have two different memory debugging tools pointing to an error
>>>around line 299 in src/mpid/ch3/src/mpid_init.c:
>>>
>>>   /* Create the string that will cache the last group of failed
>>>processes
>>>    * we received from PMI */
>>>   UPMI_KVS_GET_VALUE_LENGTH_MAX(&val);
>>>   MPIDI_failed_procs_string = MPIU_Malloc(sizeof(char) * (val+1));
>>>
>>>Both tools are reporting that malloc is being called with a large
>>>negative value, implying that val is negative here.
>>>
>>>We have a custom PMI library, and I tracked this down to an issue
>>>where PMI_KVS_Get_value_length_max() is being called before PMI_Init().
>>>
>>>Do you know if that is valid in PMI?
>>>-Adam
>>>_______________________________________________
>>>mvapich-discuss mailing list
>>>mvapich-discuss at cse.ohio-state.edu
>>>http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>      
>>>
>>_______________________________________________
>>mvapich-discuss mailing list
>>mvapich-discuss at cse.ohio-state.edu
>>http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>    
>>
>
>  
>



More information about the mvapich-discuss mailing list