[mvapich-discuss] Deadlock in while calling malloc?
Martin Cuma
martin.cuma at utah.edu
Mon Nov 9 16:26:24 EST 2015
Hello everyone,
I am seeing an occasional deadlock in a code which mallocs some memory in
a OpenMP threaded region. The MPI code is OpenMP threaded but does not
communicate from the thread so MPI is initialized with plain MPI_Init in
the MPI_THREAD_SINGLE mode.
Everything seems to run fine until I hit about 200 MPI processes with 4 or
more threads each. Then the program fairly reliably deadlocks on an MPI
collective or a barrier and when I investigate the cause, I see one or
more processes not reaching the barrier. These processes are stuck inside
an malloc call, e.g. as in the backtrace here:
Backtrace:
#0 0x00007fc405360fe6 in find_and_free_dregs_inside ()
from /uufs/chpc.utah.edu/sys/installdir/mvapich2/2.1p/lib/libmpi.so.12
#1 0x00007fc405391555 in mvapich2_mem_unhook ()
at
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/mem_hooks.c:148
#2 0x00007fc40539174d in mvapich2_munmap ()
at
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/mem_hooks.c:270
#3 0x00007fc405395661 in new_heap ()
at
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:542
#4 0x00007fc405391b80 in _int_new_arena ()
at
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:762
#5 0x00007fc4053958ff in arena_get2 ()
at
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:717
#6 0x00007fc405392c26 in malloc ()
at
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:3405
#7 0x000000000043b470 in fillerp (Ebr=value has been optimized out
) at ./fillerp.c:39
The typical allocation is done for temporary arrays inside of the threaded
region, such as:
22 #pragma omp parallel private(iif,iis,ii)
23 {
24 double _Complex *ne,*nh,*na;
...
39 ne = (double _Complex *)malloc(sizeof(double _Complex)*3*invd->Nrlmax*irx);
...
The code should be fairly clean (checked with memory checkers such as
Intel InspectorXE and used on a variety of systems/data sets, though not
typically with this high of a process count). Also, with MPICH2 or Intel
MPI I am not seeing this deadlock, which makes me suspect an issue with
MVAPICH2.
Before I dig further, I'd like to ask the forum would be if this issue
rings a bell to someone. Also, is it possible to modify the allocation
behavior using environment variables, configure options, etc? Any other
thoughts/suggestions?
Thanks,
MC
--
Martin Cuma
Center for High Performance Computing
Department of Geology and Geophysics
University of Utah
More information about the mvapich-discuss
mailing list