[mvapich-discuss] MPI_Finalize cause segment fault

吴雪 sy1406125 at buaa.edu.cn
Tue Aug 16 11:01:01 EDT 2016


Hi,
I've met a problem.My program terminated with signal SIGSEGV, Segmentation fault.The error information is :
[gpu-cluster-2:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 23147 RUNNING AT 192.168.2.2
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:1:0 at gpu-cluster-1] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:912): assert (!closed) failed
[proxy:1:0 at gpu-cluster-1] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:1:0 at gpu-cluster-1] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec at gpu-cluster-2] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec at gpu-cluster-2] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at gpu-cluster-2] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec at gpu-cluster-2] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion




and the gdb backtrace information is:
#0  0x00007fb82b7b7613 in _int_free () from /home/run/wx-workplace/mvapich2-2.1/lib/libmpi.so.12
#1  0x00007fb82b7b7b1b in free () from /home/run/wx-workplace/mvapich2-2.1/lib/libmpi.so.12
#2  0x00007fb82b69075d in MV2_cleanup_gather_tuning_table () from /home/run/wx-workplace/mvapich2-2.1/lib/libmpi.so.12
#3  0x00007fb82b5568b7 in MV2_collectives_arch_finalize () from /home/run/wx-workplace/mvapich2-2.1/lib/libmpi.so.12
#4  0x00007fb82b768df7 in MPIDI_CH3_Finalize () from /home/run/wx-workplace/mvapich2-2.1/lib/libmpi.so.12
#5  0x00007fb82b75e49b in MPID_Finalize () from /home/run/wx-workplace/mvapich2-2.1/lib/libmpi.so.12
#6  0x00007fb82b6e8037 in PMPI_Finalize () from /home/run/wx-workplace/mvapich2-2.1/lib/libmpi.so.12
#7  0x00007fb82bd4b6e4 in cudaRemoteFinalize () from ./libcudart_remote.so
#8  0x00007fb82bd505db in GC_InitStruct::~GC_InitStruct() () from ./libcudart_remote.so
#9  0x00007fb82adef4da in __cxa_finalize () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007fb82bd4a723 in __do_global_dtors_aux () from ./libcudart_remote.so
#11 0x00007fffc59facb0 in ?? ()


In my program,I use MPI_Comm_spawn to start several child programs.And the father and children use MPI_Isend,MPI_Irecv,MPI_Recv_init,MPI_Start,MPI_Wait.I've not been able to find out what causes segment fault.Or what does MPI_Finalize do?


Looking forward to your reply.


Thanks
xue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160816/05ad8162/attachment.html>


More information about the mvapich-discuss mailing list