[mvapich-discuss] Infinite loop in ptmalloc
Adam T. Moody
moody20 at llnl.gov
Tue Dec 16 20:09:55 EST 2014
Hi Jonathan,
I commented out the #if _LIBC check in ptmalloc_unlock_all2(), around
line 267 in mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c to
force the following three lines to be compiled into the library:
tsd_setspecific(arena_key, save_arena);
__malloc_hook = save_malloc_hook;
__free_hook = save_free_hook;
This seems to work around the problem.
Can you take a closer look at this #if to double-check whether it should
be there?
-Adam
Jonathan Perkins wrote:
>Hi Adam. Thanks for the additional information. We've received some
>reports about problems with our internal ptmalloc2 implementation and
>python applications misbehaving when used together.
>
>I'm glad that you have a work around for the time being. I'll touch
>bases with you again once we have more info on what is happening in this
>case after we're able to create a reproducer and have more insight on
>this problem.
>
>On Tue, Dec 16, 2014 at 02:44:39PM -0800, Adam T. Moody wrote:
>
>
>>Hi Jonathan,
>>I'll look into a reproducer. Right now, it's not trivial to reproduce.
>>It's a python app that uses MPI. The python process uses Popen to fork and
>>exec "cat" to read "/proc/cpuinfo". It's this child process that then gets
>>stuck in the infinite recursion loop. We found that a work around is to use
>>Popen to start a shell, which then cats the file... ugh.
>>
>>One thing I can see under Totalview is that ptmalloc_unlock_all2 is defined
>>and therefore apparently used, however the #if below was not compiled into
>>the library:
>>
>>#if defined _LIBC || defined MALLOC_HOOKS
>> tsd_setspecific(arena_key, save_arena);
>> __malloc_hook = save_malloc_hook;
>> __free_hook = save_free_hook;
>>#endif
>>
>>These lines look to be responsible for restoring the original hooks in the
>>child process. I'm guessing that's important. Apparently, neither _LIBC
>>nor MALLOC_HOOKS are defined. It looks like this is the only place
>>MALLOC_HOOKS is defined in all of the source code, which leads me to believe
>>this is deprecated. I'm guess _LIBC is the critical one here. Should this
>>macro be defined?
>>-Adam
>>
>>
>>Jonathan Perkins wrote:
>>
>>
>>
>>>Hi Adam. Thanks for the report and debugging info. We're inspecting
>>>this code path. In the meantime, can you provide us with a simple
>>>reproducer to help us investigate this further?
>>>
>>>On Mon, Dec 15, 2014 at 05:53:53PM -0800, Adam T. Moody wrote:
>>>
>>>
>>>
>>>>Hello MVAPICH team,
>>>>We have a code using MVAPICH2-1.9 that forks a process whose child then dies
>>>>after it eventually consumes all available memory. If I SIGSTOP the child
>>>>and attach to it before it dies, I can see from its stack trace that it's
>>>>apparently in an infinite recursion loop consisting of calls to:
>>>>
>>>>malloc_atfork()
>>>>malloc() at mvapich_malloc.c:3403
>>>>
>>>>I can see that mvapich_malloc.c:3403 is the last line of the following,
>>>>which invokes the __malloc_hook function pointer:
>>>>
>>>>__malloc_ptr_t (*hook) __MALLOC_P ((size_t, __const __malloc_ptr_t)) =
>>>> __malloc_hook;
>>>>if (hook != NULL)
>>>> return (*hook)(bytes, RETURN_ADDRESS (0));
>>>>
>>>>
>>>>
>>>>From the stack trace, I can deduce that __malloc_hook must be pointing to
>>>
>>>
>>>>malloc_atfork().
>>>>
>>>>Then looking at the malloc_atfork() impelmentation, I can see that it calls
>>>>public_mALLOc() in it's else clause, which seems like it may be the code
>>>>path leading to the loop:
>>>>
>>>>} else {
>>>> /* Suspend the thread until the `atfork' handlers have completed.
>>>> By that time, the hooks will have been reset as well, so that
>>>> mALLOc() can be used again. */
>>>> (void)mutex_lock(&list_lock);
>>>> (void)mutex_unlock(&list_lock);
>>>> return public_mALLOc(sz);
>>>>}
>>>>
>>>>Do you have ideas how this might happen? Can you imagine a case that would
>>>>lead to a loop here?
>>>>
>>>>I see a lock and followed immediately by an unlock. Does this lock really
>>>>protect anything?
>>>>Thanks,
>>>>-Adam
>>>>_______________________________________________
>>>>mvapich-discuss mailing list
>>>>mvapich-discuss at cse.ohio-state.edu
>>>>http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>>
>>>>
>>>
>>>
>
>
>
More information about the mvapich-discuss
mailing list