[mvapich-discuss] Memory error detected by TotalView and Valgrind in MV2-2.1

Hari Subramoni subramoni.1 at osu.edu
Mon Feb 8 22:30:22 EST 2016


Hi Adam,

Thanks for pointing this out.

To the best of our knowledge there is no official standards document for
PMI and our understanding is that the documentation included with pmi.h
file that ships with of SLURM and MPICH does not mention anything about it.
Please correct me if I'm wrong.

However, what you pointed out does look like an issue and we will
definitely take care of it in the next release. In our implementation of
PMI, the max_val_len is initialized to 0. That could be why we never saw it
in our internal testing before.

Regards,
Hari.

On Mon, Feb 8, 2016 at 9:13 PM, Adam T. Moody <moody20 at llnl.gov> wrote:

> I should add that our custom PMI library bails out with an error if you
> call PMI_KVS_Get_value_length_max() before calling PMI_Init().  So
> PMI_KVS_Get_value_length_max() returns an error (PMI_ERR_INIT) and it
> does not modify the output parameter val.  The malloc then fails because
> val is not initialized, and it can take on random values.
>
> I see that the MV2 code does not check the PMI return code here.
> -Adam
>
>
> Adam T. Moody wrote:
>
> > Hello MVAPICH team,
> > We have two different memory debugging tools pointing to an error
> > around line 299 in src/mpid/ch3/src/mpid_init.c:
> >
> >    /* Create the string that will cache the last group of failed
> > processes
> >     * we received from PMI */
> >    UPMI_KVS_GET_VALUE_LENGTH_MAX(&val);
> >    MPIDI_failed_procs_string = MPIU_Malloc(sizeof(char) * (val+1));
> >
> > Both tools are reporting that malloc is being called with a large
> > negative value, implying that val is negative here.
> >
> > We have a custom PMI library, and I tracked this down to an issue
> > where PMI_KVS_Get_value_length_max() is being called before PMI_Init().
> >
> > Do you know if that is valid in PMI?
> > -Adam
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160208/cda4eed3/attachment.html>


More information about the mvapich-discuss mailing list