[mvapich-discuss] MVAPICH problem in MPI_Finalize

Mark Potts potts at hpcapplications.com
Wed Jul 11 23:01:17 EDT 2007


Hi,
    I've finally tracked an intermittent problem that causes MVAPICH
    processes to generate segmentation faults during their shutdown.
    It seems to only happen on fairly large jobs on a 256 node cluster
    (8-32 cores/node).  The following is the backtrace from the core
    file of one of the failed processes from a purposely simple pgm.
    (simpleprint_c).  This particular job ran with 1024 processes.
    We are using ch_gen2 MVAPICH 0.9.9 singlerail with _SMP turned on.
    This segmentation fault occurs across a host of different pgms.
    but never on all processes and randomly(?) from one run to the
    next.

    From the core dump, the seg fault occurs as a result of the call
    to MPI_Finalize() but ultimately lies in the free() function of
    ptmalloc2/malloc.c.
    From some cursory code examination it appears that the error
    is hit when trying to unmap a memory segment.  Since the
    seg fault occurrence is seemingly random, is this perhaps a
    timing issue in which processes within an SMP node get confused
    about who should be unmapping/freeing memory?


gdb simpleprint_c core.9334
:
:
Core was generated by `/var/tmp/mjpworkspace/simpleprint_c'.
Program terminated with signal 11, Segmentation fault.
#0  free (mem=0xfa00940af900940a) at ptmalloc2/malloc.c:3455
3455    ptmalloc2/malloc.c: No such file or directory.
         in ptmalloc2/malloc.c
(gdb) bt
#0  free (mem=0xfa00940af900940a) at ptmalloc2/malloc.c:3455
#1  0x00002b70b40489c5 in free_2level_comm (comm_ptr=0x57a720) at 
create_2level_comm.c:49
#2  0x00002b70b40461af in PMPI_Comm_free (commp=0x7ffff6bb4e44) at 
comm_free.c:187
#3  0x00002b70b404604f in PMPI_Comm_free (commp=0x7ffff6bb4e70) at 
comm_free.c:217
#4  0x00002b70b404d56e in PMPI_Finalize () at finalize.c:159
#5  0x0000000000400814 in main (argc=1, argv=0x7ffff6bb4fa8) at simple.c:18
(gdb)

        regards,
-- 
***********************************
 >> Mark J. Potts, PhD
 >>
 >> HPC Applications Inc.
 >> phone: 410-992-8360 Bus
 >>        410-313-9318 Home
 >>        443-418-4375 Cell
 >> email: potts at hpcapplications.com
 >>        potts at excray.com
***********************************


More information about the mvapich-discuss mailing list