[mvapich-discuss] Deadlock in while calling malloc?

Martin Cuma martin.cuma at utah.edu
Mon Nov 9 16:26:24 EST 2015


Hello everyone,

I am seeing an occasional deadlock in a code which mallocs some memory in 
a OpenMP threaded region. The MPI code is OpenMP threaded but does not 
communicate from the thread so MPI is initialized with plain MPI_Init in 
the MPI_THREAD_SINGLE mode.

Everything seems to run fine until I hit about 200 MPI processes with 4 or 
more threads each. Then the program fairly reliably deadlocks on an MPI 
collective or a barrier and when I investigate the cause, I see one or 
more processes not reaching the barrier. These processes are stuck inside 
an malloc call, e.g. as in the backtrace here:
Backtrace:
#0  0x00007fc405360fe6 in find_and_free_dregs_inside ()
    from /uufs/chpc.utah.edu/sys/installdir/mvapich2/2.1p/lib/libmpi.so.12
#1  0x00007fc405391555 in mvapich2_mem_unhook ()
     at 
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/mem_hooks.c:148
#2  0x00007fc40539174d in mvapich2_munmap ()
     at 
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/mem_hooks.c:270
#3  0x00007fc405395661 in new_heap ()
     at 
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:542
#4  0x00007fc405391b80 in _int_new_arena ()
     at 
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:762
#5  0x00007fc4053958ff in arena_get2 ()
     at 
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:717
#6  0x00007fc405392c26 in malloc ()
     at 
../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:3405
#7  0x000000000043b470 in fillerp (Ebr=value has been optimized out
) at ./fillerp.c:39

The typical allocation is done for temporary arrays inside of the threaded 
region, such as:
  22 #pragma omp parallel private(iif,iis,ii)
  23 {
  24  double _Complex *ne,*nh,*na;
...
  39  ne = (double _Complex *)malloc(sizeof(double _Complex)*3*invd->Nrlmax*irx);
...

The code should be fairly clean (checked with memory checkers such as 
Intel InspectorXE and used on a variety of systems/data sets, though not 
typically with this high of a process count). Also, with MPICH2 or Intel 
MPI I am not seeing this deadlock, which makes me suspect an issue with 
MVAPICH2.

Before I dig further, I'd like to ask the forum would be if this issue 
rings a bell to someone. Also, is it possible to modify the allocation 
behavior using environment variables, configure options, etc? Any other 
thoughts/suggestions?

Thanks,
MC

-- 
Martin Cuma
Center for High Performance Computing
Department of Geology and Geophysics
University of Utah


More information about the mvapich-discuss mailing list