[mvapich-discuss] Deadlock in while calling malloc?
Jonathan Perkins
perkinjo at cse.ohio-state.edu
Tue Nov 10 15:58:36 EST 2015
The mapping method should be fine. Can you verify that you modified the
program to call MPI_Init_thread?
Also let us know how many processes you're running per node.
On Tue, Nov 10, 2015 at 3:35 PM Martin Cuma <martin.cuma at utah.edu> wrote:
> Hi Jonathan,
>
> good points, tried that with "mpirun -genv MV2_ENABLE_AFFINITY 0 -bind-to
> numa -map-by numa" - as I am also trying to keep the process' threads on
> one socket. Still get the deadlock. This is on 240 procs, 4 threads each.
> When going down to 2 threads per process, it seems to run through, but,
> that's probably just due to decreased chance of threads stepping over each
> other.
>
> Any other thoughts on this?
>
> BTW, the above mpirun commands (inherited from MPICH) seemed to be the
> simplest to achieve the socket affinity with multithreaded MVAPICH2
> program - any objections to that?
>
> Thanks,
> MC
>
> --
> Martin Cuma
> Center for High Performance Computing
> Department of Geology and Geophysics
> University of Utah
>
> On Tue, 10 Nov 2015, Jonathan Perkins wrote:
>
> > Hello Martin. Can you try modifying your program to
> call MPI_Init_thread and
> > request MPI_THREAD_FUNNELED. When running your program also
> set MV2_ENABLE_AFFINITY=0.
> > I think this may resolve your issue since each thread is actually
> entering the MPI library
> > during the malloc calls.
> >
> > On Mon, Nov 9, 2015 at 4:27 PM Martin Cuma <martin.cuma at utah.edu> wrote:
> > Hello everyone,
> >
> > I am seeing an occasional deadlock in a code which mallocs some
> memory in
> > a OpenMP threaded region. The MPI code is OpenMP threaded but does
> not
> > communicate from the thread so MPI is initialized with plain
> MPI_Init in
> > the MPI_THREAD_SINGLE mode.
> >
> > Everything seems to run fine until I hit about 200 MPI processes
> with 4 or
> > more threads each. Then the program fairly reliably deadlocks on
> an MPI
> > collective or a barrier and when I investigate the cause, I see
> one or
> > more processes not reaching the barrier. These processes are stuck
> inside
> > an malloc call, e.g. as in the backtrace here:
> > Backtrace:
> > #0 0x00007fc405360fe6 in find_and_free_dregs_inside ()
> > from /uufs/
> chpc.utah.edu/sys/installdir/mvapich2/2.1p/lib/libmpi.so.12
> > #1 0x00007fc405391555 in mvapich2_mem_unhook ()
> > at
> >
> ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/mem_hooks.c:148
> > #2 0x00007fc40539174d in mvapich2_munmap ()
> > at
> >
> ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/mem_hooks.c:270
> > #3 0x00007fc405395661 in new_heap ()
> > at
> >
> ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:542
> > #4 0x00007fc405391b80 in _int_new_arena ()
> > at
> >
> ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:762
> > #5 0x00007fc4053958ff in arena_get2 ()
> > at
> >
> ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/arena.c:717
> > #6 0x00007fc405392c26 in malloc ()
> > at
> >
> ../../../srcdir/mvapich2/2.1/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_mall
> > oc.c:3405
> > #7 0x000000000043b470 in fillerp (Ebr=value has been optimized out
> > ) at ./fillerp.c:39
> >
> > The typical allocation is done for temporary arrays inside of the
> threaded
> > region, such as:
> > 22 #pragma omp parallel private(iif,iis,ii)
> > 23 {
> > 24 double _Complex *ne,*nh,*na;
> > ...
> > 39 ne = (double _Complex *)malloc(sizeof(double
> > _Complex)*3*invd->Nrlmax*irx);
> > ...
> >
> > The code should be fairly clean (checked with memory checkers such
> as
> > Intel InspectorXE and used on a variety of systems/data sets,
> though not
> > typically with this high of a process count). Also, with MPICH2 or
> Intel
> > MPI I am not seeing this deadlock, which makes me suspect an issue
> with
> > MVAPICH2.
> >
> > Before I dig further, I'd like to ask the forum would be if this
> issue
> > rings a bell to someone. Also, is it possible to modify the
> allocation
> > behavior using environment variables, configure options, etc? Any
> other
> > thoughts/suggestions?
> >
> > Thanks,
> > MC
> >
> > --
> > Martin Cuma
> > Center for High Performance Computing
> > Department of Geology and Geophysics
> > University of Utah
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151110/31f891ac/attachment-0001.html>
More information about the mvapich-discuss
mailing list