[mvapich-discuss] Infinite loop in ptmalloc

Adam T. Moody moody20 at llnl.gov
Tue Dec 16 21:12:45 EST 2014


Hi Jonathan,
I've come up with a simple reproducer in C:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include "mpi.h"

int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);

    pid_t pid = fork();

    if (pid == 0) {
        void* buf = malloc(2);
        buf = realloc(buf, 4);
        void* buf2 = malloc(6);
        free(buf);
        free(buf2);
        return 0;
    } else {
        sleep(300);
    }

    MPI_Finalize();

    return 0;
}

You can run this as a single task MPI job.  In my testing, the child 
process ends up in infinite recursion as I've described before.  There 
are potentially two separate bugs:

1) The malloc hooks are not being restored after fork in 
ptmalloc_unlock_all2 in the child proc due to the "#if _LIBC || 
MALLOC_HOOKS" guard at arena.c:269.

2) An application call to realloc updates the last arena id via 
tsd_setspecific at mvapich_malloc.c:3594.  If the app calls realloc 
followed by malloc all while the malloc_atfork routine is in place, 
MVAPICH enters the recursion loop.  It seems realloc is the only wrapper 
to call tsd_setspecific, so perhaps it shouldn't or perhaps it needs a 
realloc_atfork hook?

Please double check me on this.
-Adam



Jonathan Perkins wrote:

>Sure thing.
>On Dec 16, 2014 8:10 PM, "Adam T. Moody" <moody20 at llnl.gov> wrote:
>
>  
>
>>Hi Jonathan,
>>I commented out the #if _LIBC check in ptmalloc_unlock_all2(), around line
>>267 in mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c to force the
>>following three lines to be compiled into the library:
>>
>>tsd_setspecific(arena_key, save_arena);
>>__malloc_hook = save_malloc_hook;
>>__free_hook = save_free_hook;
>>
>>This seems to work around the problem.
>>
>>Can you take a closer look at this #if to double-check whether it should
>>be there?
>>-Adam
>>
>>
>>
>>Jonathan Perkins wrote:
>>
>> Hi Adam.  Thanks for the additional information.  We've received some
>>    
>>
>>>reports about problems with our internal ptmalloc2 implementation and
>>>python applications misbehaving when used together.
>>>
>>>I'm glad that you have a work around for the time being.  I'll touch
>>>bases with you again once we have more info on what is happening in this
>>>case after we're able to create a reproducer and have more insight on
>>>this problem.
>>>
>>>On Tue, Dec 16, 2014 at 02:44:39PM -0800, Adam T. Moody wrote:
>>>
>>>
>>>      
>>>
>>>>Hi Jonathan,
>>>>I'll look into a reproducer.  Right now, it's not trivial to reproduce.
>>>>It's a python app that uses MPI.  The python process uses Popen to fork
>>>>and
>>>>exec "cat" to read "/proc/cpuinfo".  It's this child process that then
>>>>gets
>>>>stuck in the infinite recursion loop.  We found that a work around is to
>>>>use
>>>>Popen to start a shell, which then cats the file... ugh.
>>>>
>>>>One thing I can see under Totalview is that ptmalloc_unlock_all2 is
>>>>defined
>>>>and therefore apparently used, however the #if below was not compiled
>>>>into
>>>>the library:
>>>>
>>>>#if defined _LIBC || defined MALLOC_HOOKS
>>>>tsd_setspecific(arena_key, save_arena);
>>>>__malloc_hook = save_malloc_hook;
>>>>__free_hook = save_free_hook;
>>>>#endif
>>>>
>>>>These lines look to be responsible for restoring the original hooks in
>>>>the
>>>>child process.  I'm guessing that's important.  Apparently, neither _LIBC
>>>>nor MALLOC_HOOKS are defined.  It looks like this is the only place
>>>>MALLOC_HOOKS is defined in all of the source code, which leads me to
>>>>believe
>>>>this is deprecated.  I'm guess _LIBC is the critical one here.  Should
>>>>this
>>>>macro be defined?
>>>>-Adam
>>>>
>>>>
>>>>Jonathan Perkins wrote:
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>Hi Adam.  Thanks for the report and debugging info.  We're inspecting
>>>>>this code path.  In the meantime, can you provide us with a simple
>>>>>reproducer to help us investigate this further?
>>>>>
>>>>>On Mon, Dec 15, 2014 at 05:53:53PM -0800, Adam T. Moody wrote:
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>Hello MVAPICH team,
>>>>>>We have a code using MVAPICH2-1.9 that forks a process whose child
>>>>>>then dies
>>>>>>after it eventually consumes all available memory.  If I SIGSTOP the
>>>>>>child
>>>>>>and attach to it before it dies, I can see from its stack trace that
>>>>>>it's
>>>>>>apparently in an infinite recursion loop consisting of calls to:
>>>>>>
>>>>>>malloc_atfork()
>>>>>>malloc() at mvapich_malloc.c:3403
>>>>>>
>>>>>>I can see that mvapich_malloc.c:3403 is the last line of the following,
>>>>>>which invokes the __malloc_hook function pointer:
>>>>>>
>>>>>>__malloc_ptr_t (*hook) __MALLOC_P ((size_t, __const __malloc_ptr_t)) =
>>>>>>__malloc_hook;
>>>>>>if (hook != NULL)
>>>>>>return (*hook)(bytes, RETURN_ADDRESS (0));
>>>>>>
>>>>>>
>>>>>>From the stack trace, I can deduce that __malloc_hook must be pointing
>>>>>>to
>>>>>>
>>>>>>            
>>>>>>
>>>>>          
>>>>>
>>>>>>malloc_atfork().
>>>>>>
>>>>>>Then looking at the malloc_atfork() impelmentation, I can see that it
>>>>>>calls
>>>>>>public_mALLOc() in it's else clause, which seems like it may be the
>>>>>>code
>>>>>>path leading to the loop:
>>>>>>
>>>>>>} else {
>>>>>>/* Suspend the thread until the `atfork' handlers have completed.
>>>>>>   By that time, the hooks will have been reset as well, so that
>>>>>>   mALLOc() can be used again. */
>>>>>>(void)mutex_lock(&list_lock);
>>>>>>(void)mutex_unlock(&list_lock);
>>>>>>return public_mALLOc(sz);
>>>>>>}
>>>>>>
>>>>>>Do you have ideas how this might happen?  Can you imagine a case that
>>>>>>would
>>>>>>lead to a loop here?
>>>>>>
>>>>>>I see a lock and followed immediately by an unlock.  Does this lock
>>>>>>really
>>>>>>protect anything?
>>>>>>Thanks,
>>>>>>-Adam
>>>>>>_______________________________________________
>>>>>>mvapich-discuss mailing list
>>>>>>mvapich-discuss at cse.ohio-state.edu
>>>>>>http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>          
>>>>>
>>>      
>>>
>>    
>>
>
>  
>



More information about the mvapich-discuss mailing list