[mvapich-discuss] MPI_Finalize cause segment fault

Hari Subramoni subramoni.1 at osu.edu
Tue Aug 16 11:13:21 EDT 2016


Hello,

Sorry to hear that you're facing an issue with MVAPICH2. If possible, can
you share your test program with us? In the mean time, can you try running
your program after setting MV2_USE_INDEXED_TUNING=0?

Thx,
Hari.

On Tue, Aug 16, 2016 at 11:01 AM, 吴雪 <sy1406125 at buaa.edu.cn> wrote:

> Hi,
> I've met a problem.My program terminated with signal SIGSEGV, Segmentation
> fault.The error information is :
> [gpu-cluster-2:mpi_rank_0][error_sighandler] Caught error: Segmentation
> fault (signal 11)
>
> ============================================================
> =======================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 23147 RUNNING AT 192.168.2.2
> =   EXIT CODE: 139
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ============================================================
> =======================
> [proxy:1:0 at gpu-cluster-1] HYD_pmcd_pmip_control_cmd_cb
> (pm/pmiserv/pmip_cb.c:912): assert (!closed) failed
> [proxy:1:0 at gpu-cluster-1] HYDT_dmxu_poll_wait_for_event
> (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:1:0 at gpu-cluster-1] main (pm/pmiserv/pmip.c:206): demux engine
> error waiting for event
> [mpiexec at gpu-cluster-2] HYDT_bscu_wait_for_completion
> (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
> badly; aborting
> [mpiexec at gpu-cluster-2] HYDT_bsci_wait_for_completion
> (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
> completion
> [mpiexec at gpu-cluster-2] HYD_pmci_wait_for_completion
> (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for
> completion
> [mpiexec at gpu-cluster-2] main (ui/mpich/mpiexec.c:344): process manager
> error waiting for completion
>
>
> and the gdb backtrace information is:
> #0  0x00007fb82b7b7613 in _int_free () from /home/run/wx-workplace/
> mvapich2-2.1/lib/libmpi.so.12
> #1  0x00007fb82b7b7b1b in free () from /home/run/wx-workplace/
> mvapich2-2.1/lib/libmpi.so.12
> #2  0x00007fb82b69075d in MV2_cleanup_gather_tuning_table () from
> /home/run/wx-workplace/mvapich2-2.1/lib/libmpi.so.12
> #3  0x00007fb82b5568b7 in MV2_collectives_arch_finalize () from
> /home/run/wx-workplace/mvapich2-2.1/lib/libmpi.so.12
> #4  0x00007fb82b768df7 in MPIDI_CH3_Finalize () from
> /home/run/wx-workplace/mvapich2-2.1/lib/libmpi.so.12
> #5  0x00007fb82b75e49b in MPID_Finalize () from /home/run/wx-workplace/
> mvapich2-2.1/lib/libmpi.so.12
> #6  0x00007fb82b6e8037 in PMPI_Finalize () from /home/run/wx-workplace/
> mvapich2-2.1/lib/libmpi.so.12
> #7  0x00007fb82bd4b6e4 in cudaRemoteFinalize () from ./libcudart_remote.so
> #8  0x00007fb82bd505db in GC_InitStruct::~GC_InitStruct() () from
> ./libcudart_remote.so
> #9  0x00007fb82adef4da in __cxa_finalize () from
> /lib/x86_64-linux-gnu/libc.so.6
> #10 0x00007fb82bd4a723 in __do_global_dtors_aux () from
> ./libcudart_remote.so
> #11 0x00007fffc59facb0 in ?? ()
>
> In my program,I use MPI_Comm_spawn to start several child programs.And the
> father and children use MPI_Isend,MPI_Irecv,MPI_Recv_init,MPI_Start,MPI_Wait.I've
> not been able to find out what causes segment fault.Or what does
> MPI_Finalize do?
>
> Looking forward to your reply.
>
> Thanks
> xue
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160816/af773110/attachment-0001.html>


More information about the mvapich-discuss mailing list