[mvapich-discuss] Infinite loop in ptmalloc
Adam T. Moody
moody20 at llnl.gov
Tue Dec 16 17:44:39 EST 2014
Hi Jonathan,
I'll look into a reproducer. Right now, it's not trivial to reproduce.
It's a python app that uses MPI. The python process uses Popen to fork
and exec "cat" to read "/proc/cpuinfo". It's this child process that
then gets stuck in the infinite recursion loop. We found that a work
around is to use Popen to start a shell, which then cats the file... ugh.
One thing I can see under Totalview is that ptmalloc_unlock_all2 is
defined and therefore apparently used, however the #if below was not
compiled into the library:
#if defined _LIBC || defined MALLOC_HOOKS
tsd_setspecific(arena_key, save_arena);
__malloc_hook = save_malloc_hook;
__free_hook = save_free_hook;
#endif
These lines look to be responsible for restoring the original hooks in
the child process. I'm guessing that's important. Apparently, neither
_LIBC nor MALLOC_HOOKS are defined. It looks like this is the only
place MALLOC_HOOKS is defined in all of the source code, which leads me
to believe this is deprecated. I'm guess _LIBC is the critical one
here. Should this macro be defined?
-Adam
Jonathan Perkins wrote:
>Hi Adam. Thanks for the report and debugging info. We're inspecting
>this code path. In the meantime, can you provide us with a simple
>reproducer to help us investigate this further?
>
>On Mon, Dec 15, 2014 at 05:53:53PM -0800, Adam T. Moody wrote:
>
>
>>Hello MVAPICH team,
>>We have a code using MVAPICH2-1.9 that forks a process whose child then dies
>>after it eventually consumes all available memory. If I SIGSTOP the child
>>and attach to it before it dies, I can see from its stack trace that it's
>>apparently in an infinite recursion loop consisting of calls to:
>>
>>malloc_atfork()
>>malloc() at mvapich_malloc.c:3403
>>
>>I can see that mvapich_malloc.c:3403 is the last line of the following,
>>which invokes the __malloc_hook function pointer:
>>
>> __malloc_ptr_t (*hook) __MALLOC_P ((size_t, __const __malloc_ptr_t)) =
>> __malloc_hook;
>> if (hook != NULL)
>> return (*hook)(bytes, RETURN_ADDRESS (0));
>>
>>From the stack trace, I can deduce that __malloc_hook must be pointing to
>>malloc_atfork().
>>
>>Then looking at the malloc_atfork() impelmentation, I can see that it calls
>>public_mALLOc() in it's else clause, which seems like it may be the code
>>path leading to the loop:
>>
>> } else {
>> /* Suspend the thread until the `atfork' handlers have completed.
>> By that time, the hooks will have been reset as well, so that
>> mALLOc() can be used again. */
>> (void)mutex_lock(&list_lock);
>> (void)mutex_unlock(&list_lock);
>> return public_mALLOc(sz);
>> }
>>
>>Do you have ideas how this might happen? Can you imagine a case that would
>>lead to a loop here?
>>
>>I see a lock and followed immediately by an unlock. Does this lock really
>>protect anything?
>>Thanks,
>>-Adam
>>_______________________________________________
>>mvapich-discuss mailing list
>>mvapich-discuss at cse.ohio-state.edu
>>http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
>
More information about the mvapich-discuss
mailing list