[mvapich-discuss] Infinite loop in ptmalloc

Adam T. Moody moody20 at llnl.gov
Tue Dec 16 17:44:39 EST 2014


Hi Jonathan,
I'll look into a reproducer.  Right now, it's not trivial to reproduce.  
It's a python app that uses MPI.  The python process uses Popen to fork 
and exec "cat" to read "/proc/cpuinfo".  It's this child process that 
then gets stuck in the infinite recursion loop.  We found that a work 
around is to use Popen to start a shell, which then cats the file... ugh.

One thing I can see under Totalview is that ptmalloc_unlock_all2 is 
defined and therefore apparently used, however the #if below was not 
compiled into the library:

#if defined _LIBC || defined MALLOC_HOOKS
  tsd_setspecific(arena_key, save_arena);
  __malloc_hook = save_malloc_hook;
  __free_hook = save_free_hook;
#endif

These lines look to be responsible for restoring the original hooks in 
the child process.  I'm guessing that's important.  Apparently, neither 
_LIBC nor MALLOC_HOOKS are defined.  It looks like this is the only 
place MALLOC_HOOKS is defined in all of the source code, which leads me 
to believe this is deprecated.  I'm guess _LIBC is the critical one 
here.  Should this macro be defined?
-Adam


Jonathan Perkins wrote:

>Hi Adam.  Thanks for the report and debugging info.  We're inspecting
>this code path.  In the meantime, can you provide us with a simple
>reproducer to help us investigate this further?
>
>On Mon, Dec 15, 2014 at 05:53:53PM -0800, Adam T. Moody wrote:
>  
>
>>Hello MVAPICH team,
>>We have a code using MVAPICH2-1.9 that forks a process whose child then dies
>>after it eventually consumes all available memory.  If I SIGSTOP the child
>>and attach to it before it dies, I can see from its stack trace that it's
>>apparently in an infinite recursion loop consisting of calls to:
>>
>>malloc_atfork()
>>malloc() at mvapich_malloc.c:3403
>>
>>I can see that mvapich_malloc.c:3403 is the last line of the following,
>>which invokes the __malloc_hook function pointer:
>>
>> __malloc_ptr_t (*hook) __MALLOC_P ((size_t, __const __malloc_ptr_t)) =
>>   __malloc_hook;
>> if (hook != NULL)
>>   return (*hook)(bytes, RETURN_ADDRESS (0));
>>
>>From the stack trace, I can deduce that __malloc_hook must be pointing to
>>malloc_atfork().
>>
>>Then looking at the malloc_atfork() impelmentation, I can see that it calls
>>public_mALLOc() in it's else clause, which seems like it may be the code
>>path leading to the loop:
>>
>> } else {
>>   /* Suspend the thread until the `atfork' handlers have completed.
>>      By that time, the hooks will have been reset as well, so that
>>      mALLOc() can be used again. */
>>   (void)mutex_lock(&list_lock);
>>   (void)mutex_unlock(&list_lock);
>>   return public_mALLOc(sz);
>> }
>>
>>Do you have ideas how this might happen?  Can you imagine a case that would
>>lead to a loop here?
>>
>>I see a lock and followed immediately by an unlock.  Does this lock really
>>protect anything?
>>Thanks,
>>-Adam
>>_______________________________________________
>>mvapich-discuss mailing list
>>mvapich-discuss at cse.ohio-state.edu
>>http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>    
>>
>
>  
>



More information about the mvapich-discuss mailing list