[mvapich-discuss] Deadlock in while calling malloc?

Tue Nov 10 15:58:36 EST 2015

The mapping method should be fine.  Can you verify that you modified the
program to call MPI_Init_thread?

Also let us know how many processes you're running per node.

On Tue, Nov 10, 2015 at 3:35 PM Martin Cuma <martin.cuma at utah.edu> wrote:

> Hi Jonathan,
>
> good points, tried that with "mpirun -genv MV2_ENABLE_AFFINITY 0 -bind-to
> numa -map-by numa" - as I am also trying to keep the process' threads on
> one socket. Still get the deadlock. This is on 240 procs, 4 threads each.
> When going down to 2 threads per process, it seems to run through, but,
> that's probably just due to decreased chance of threads stepping over each
> other.
>
> Any other thoughts on this?
>
> BTW, the above mpirun commands (inherited from MPICH) seemed to be the
> simplest to achieve the socket affinity with multithreaded MVAPICH2
> program - any objections to that?
>
> Thanks,
> MC
>
> --
> Martin Cuma
> Center for High Performance Computing
> Department of Geology and Geophysics
> University of Utah
>
> On Tue, 10 Nov 2015, Jonathan Perkins wrote:
>
> > Hello Martin.  Can you try modifying your program to
> call MPI_Init_thread and
> > request MPI_THREAD_FUNNELED.  When running your program also
> set MV2_ENABLE_AFFINITY=0.
> > I think this may resolve your issue since each thread is actually
> entering the MPI library
> > during the malloc calls.
> >
> > On Mon, Nov 9, 2015 at 4:27 PM Martin Cuma <martin.cuma at utah.edu> wrote:
> >       Hello everyone,
> >
> >       I am seeing an occasional deadlock in a code which mallocs some
> memory in
> >       a OpenMP threaded region. The MPI code is OpenMP threaded but does
> not
> >       communicate from the thread so MPI is initialized with plain
> MPI_Init in
> >       the MPI_THREAD_SINGLE mode.
> >
> >       Everything seems to run fine until I hit about 200 MPI processes
> with 4 or
> >       more threads each. Then the program fairly reliably deadlocks on
> an MPI
> >       collective or a barrier and when I investigate the cause, I see
> one or
> >       more processes not reaching the barrier. These processes are stuck
> inside
> >       an malloc call, e.g. as in the backtrace here:
> >       Backtrace:
> >       #0  0x00007fc405360fe6 in find_and_free_dregs_inside ()
> >           from /uufs/
> chpc.utah.edu/sys/installdir/mvapich2/2.1p/lib/libmpi.so.12
> >       #1  0x00007fc405391555 in mvapich2_mem_unhook ()
> >            at
> >
>  ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/mem_hooks.c:148
> >       #2  0x00007fc40539174d in mvapich2_munmap ()
> >            at
> >
>  ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/mem_hooks.c:270
> >       #3  0x00007fc405395661 in new_heap ()
> >            at
> >
>  ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:542
> >       #4  0x00007fc405391b80 in _int_new_arena ()
> >            at
> >
>  ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:762
> >       #5  0x00007fc4053958ff in arena_get2 ()
> >            at
> >
>  ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:717
> >       #6  0x00007fc405392c26 in malloc ()
> >            at
> >
> ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_mall
> >       oc.c:3405
> >       #7  0x000000000043b470 in fillerp (Ebr=value has been optimized out
> >       ) at ./fillerp.c:39
> >
> >       The typical allocation is done for temporary arrays inside of the
> threaded
> >       region, such as:
> >         22 #pragma omp parallel private(iif,iis,ii)
> >         23 {
> >         24  double _Complex *ne,*nh,*na;
> >       ...
> >         39  ne = (double _Complex *)malloc(sizeof(double
> >       _Complex)*3*invd->Nrlmax*irx);
> >       ...
> >
> >       The code should be fairly clean (checked with memory checkers such
> as
> >       Intel InspectorXE and used on a variety of systems/data sets,
> though not
> >       typically with this high of a process count). Also, with MPICH2 or
> Intel
> >       MPI I am not seeing this deadlock, which makes me suspect an issue
> with
> >       MVAPICH2.
> >
> >       Before I dig further, I'd like to ask the forum would be if this
> issue
> >       rings a bell to someone. Also, is it possible to modify the
> allocation
> >       behavior using environment variables, configure options, etc? Any
> other
> >       thoughts/suggestions?
> >
> >       Thanks,
> >       MC
> >
> >       --
> >       Martin Cuma
> >       Center for High Performance Computing
> >       Department of Geology and Geophysics
> >       University of Utah
> >       _______________________________________________
> >       mvapich-discuss mailing list
> >       mvapich-discuss at cse.ohio-state.edu
> >       http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151110/31f891ac/attachment-0001.html>