[mvapich-discuss] Infinite loop in ptmalloc

Adam T. Moody moody20 at llnl.gov
Tue Dec 16 20:09:55 EST 2014


Hi Jonathan,
I commented out the #if _LIBC check in ptmalloc_unlock_all2(), around 
line 267 in mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c to 
force the following three lines to be compiled into the library:

tsd_setspecific(arena_key, save_arena);
__malloc_hook = save_malloc_hook;
__free_hook = save_free_hook;

This seems to work around the problem.

Can you take a closer look at this #if to double-check whether it should 
be there?
-Adam



Jonathan Perkins wrote:

>Hi Adam.  Thanks for the additional information.  We've received some
>reports about problems with our internal ptmalloc2 implementation and
>python applications misbehaving when used together.
>
>I'm glad that you have a work around for the time being.  I'll touch
>bases with you again once we have more info on what is happening in this
>case after we're able to create a reproducer and have more insight on
>this problem.
>
>On Tue, Dec 16, 2014 at 02:44:39PM -0800, Adam T. Moody wrote:
>  
>
>>Hi Jonathan,
>>I'll look into a reproducer.  Right now, it's not trivial to reproduce.
>>It's a python app that uses MPI.  The python process uses Popen to fork and
>>exec "cat" to read "/proc/cpuinfo".  It's this child process that then gets
>>stuck in the infinite recursion loop.  We found that a work around is to use
>>Popen to start a shell, which then cats the file... ugh.
>>
>>One thing I can see under Totalview is that ptmalloc_unlock_all2 is defined
>>and therefore apparently used, however the #if below was not compiled into
>>the library:
>>
>>#if defined _LIBC || defined MALLOC_HOOKS
>> tsd_setspecific(arena_key, save_arena);
>> __malloc_hook = save_malloc_hook;
>> __free_hook = save_free_hook;
>>#endif
>>
>>These lines look to be responsible for restoring the original hooks in the
>>child process.  I'm guessing that's important.  Apparently, neither _LIBC
>>nor MALLOC_HOOKS are defined.  It looks like this is the only place
>>MALLOC_HOOKS is defined in all of the source code, which leads me to believe
>>this is deprecated.  I'm guess _LIBC is the critical one here.  Should this
>>macro be defined?
>>-Adam
>>
>>
>>Jonathan Perkins wrote:
>>
>>    
>>
>>>Hi Adam.  Thanks for the report and debugging info.  We're inspecting
>>>this code path.  In the meantime, can you provide us with a simple
>>>reproducer to help us investigate this further?
>>>
>>>On Mon, Dec 15, 2014 at 05:53:53PM -0800, Adam T. Moody wrote:
>>>
>>>      
>>>
>>>>Hello MVAPICH team,
>>>>We have a code using MVAPICH2-1.9 that forks a process whose child then dies
>>>>after it eventually consumes all available memory.  If I SIGSTOP the child
>>>>and attach to it before it dies, I can see from its stack trace that it's
>>>>apparently in an infinite recursion loop consisting of calls to:
>>>>
>>>>malloc_atfork()
>>>>malloc() at mvapich_malloc.c:3403
>>>>
>>>>I can see that mvapich_malloc.c:3403 is the last line of the following,
>>>>which invokes the __malloc_hook function pointer:
>>>>
>>>>__malloc_ptr_t (*hook) __MALLOC_P ((size_t, __const __malloc_ptr_t)) =
>>>> __malloc_hook;
>>>>if (hook != NULL)
>>>> return (*hook)(bytes, RETURN_ADDRESS (0));
>>>>
>>>>        
>>>>
>>>>From the stack trace, I can deduce that __malloc_hook must be pointing to
>>>      
>>>
>>>>malloc_atfork().
>>>>
>>>>Then looking at the malloc_atfork() impelmentation, I can see that it calls
>>>>public_mALLOc() in it's else clause, which seems like it may be the code
>>>>path leading to the loop:
>>>>
>>>>} else {
>>>> /* Suspend the thread until the `atfork' handlers have completed.
>>>>    By that time, the hooks will have been reset as well, so that
>>>>    mALLOc() can be used again. */
>>>> (void)mutex_lock(&list_lock);
>>>> (void)mutex_unlock(&list_lock);
>>>> return public_mALLOc(sz);
>>>>}
>>>>
>>>>Do you have ideas how this might happen?  Can you imagine a case that would
>>>>lead to a loop here?
>>>>
>>>>I see a lock and followed immediately by an unlock.  Does this lock really
>>>>protect anything?
>>>>Thanks,
>>>>-Adam
>>>>_______________________________________________
>>>>mvapich-discuss mailing list
>>>>mvapich-discuss at cse.ohio-state.edu
>>>>http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>>        
>>>>
>>>      
>>>
>
>  
>



More information about the mvapich-discuss mailing list