[mvapich-discuss] segmentation fault (signal 11), Exit code 139Ok
Hoot Thompson
hoot at ptpnow.com
Fri Aug 17 12:36:00 EDT 2012
[root at mas-nn-ib bin]# /usr/local/other/mvapich2/bin/mpiexec -n 2 -env MV2_DEBUG_SHOW_BACKTRACE 1 -hosts mas-nn-ib,mas-dn1-ib /usr/local/other/benchmarks/osu_benchmarks/osu_bw
[mas-dn01-ib:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)
[mas-dn01-ib:mpi_rank_1][print_backtrace] 0: /usr/local/other/benchmarks/osu_benchmarks/osu_bw() [0x4f1b19]
[mas-dn01-ib:mpi_rank_1][print_backtrace] 1: /lib64/libpthread.so.0() [0x3ebd60f500]
[mas-dn01-ib:mpi_rank_1][print_backtrace] 2: /usr/local/other/benchmarks/osu_benchmarks/osu_bw() [0x452901]
[mas-dn01-ib:mpi_rank_1][print_backtrace] 3: /usr/local/other/benchmarks/osu_benchmarks/osu_bw() [0x42750c]
[mas-dn01-ib:mpi_rank_1][print_backtrace] 4: /usr/local/other/benchmarks/osu_benchmarks/osu_bw() [0x4f286f]
[mas-dn01-ib:mpi_rank_1][print_backtrace] 5: /usr/local/other/benchmarks/osu_benchmarks/osu_bw() [0x486cfc]
[mas-dn01-ib:mpi_rank_1][print_backtrace] 6: /usr/local/other/benchmarks/osu_benchmarks/osu_bw() [0x40deca]
[mas-dn01-ib:mpi_rank_1][print_backtrace] 7: /usr/local/other/benchmarks/osu_benchmarks/osu_bw() [0x40d6e4]
[mas-dn01-ib:mpi_rank_1][print_backtrace] 8: /usr/local/other/benchmarks/osu_benchmarks/osu_bw() [0x405572]
[mas-dn01-ib:mpi_rank_1][print_backtrace] 9: /lib64/libc.so.6(__libc_start_main+0xfd) [0x3ebd21ecdd]
[mas-dn01-ib:mpi_rank_1][print_backtrace] 10: /usr/local/other/benchmarks/osu_benchmarks/osu_bw() [0x405449]
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
[proxy:0:0 at mas-nn-ib] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:955): assert (!closed) failed
[proxy:0:0 at mas-nn-ib] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at mas-nn-ib] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
[mpiexec at mas-nn-ib] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
[mpiexec at mas-nn-ib] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec at mas-nn-ib] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for completion
[mpiexec at mas-nn-ib] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
-----Original Message-----
From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
Sent: Friday, August 17, 2012 12:12 PM
To: Hoot Thompson
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] segmentation fault (signal 11), Exit code 139Ok
If you're using mpirun_rsh
mpirun_rsh -n 2 MV2_DEBUG_SHOW_BACKTRACE=1 mpiprogram
If you're using mpiexec
mpiexec -n 2 -env MV2_DEBUG_SHOW_BACKTRACE 1 mpiprogram
On Fri, Aug 17, 2012 at 11:59:41AM -0400, Hoot Thompson wrote:
> Ok, so I recompiled with the debug flags but didn't get any additional error
> info. Where do I look for the info? How do I invoke
> MV2_DEBUG_SHOW_BACKTRACE?
>
>
> -----Original Message-----
> From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
> Sent: Friday, August 17, 2012 11:14 AM
> To: Hoot Thompson
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: Re: [mvapich-discuss] segmentation fault (signal 11), Exit code 139
>
> On Fri, Aug 17, 2012 at 10:33:55AM -0400, Hoot Thompson wrote:
> > I have new cluster that I’ve configured in a manner similar to other
> > systems. Getting the following error when running between nodes, works
> fine
> > when running on same node (either of two).
>
> Can you tell us a little bit about the architecture of systems as well
> as the software environment (such as OS and any schedulers in use).
>
> I also suggest taking a look at
> https://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.8.html#x1-1
> 120009.1.10
>
> Try using the MV2_DEBUG_SHOW_BACKTRACE parameter to see if you get more
> output. Also when doing your debug build, use --disable-fast in
> addition to --enable-g=dbg.
>
> >
> > Hoot
> >
> >
> > [root at mas-nn-ib ~]# /usr/local/other/mvapich2/bin/mpirun -n 2 -hosts
> > mas-nn-ib,mas-dn1-ib /usr/local/other/benchmarks/osu_benchmarks/osu_bw
> > [mas-dn01-ib:mpi_rank_1][error_sighandler] Caught error: Segmentation
> fault
> > (signal 11)
> >
> >
> ============================================================================
> > =========
> > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > = EXIT CODE: 139
> > = CLEANING UP REMAINING PROCESSES
> > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >
> ============================================================================
> > =========
> > [proxy:0:0 at mas-nn-ib] HYD_pmcd_pmip_control_cmd_cb
> > (./pm/pmiserv/pmip_cb.c:955): assert (!closed) failed
> > [proxy:0:0 at mas-nn-ib] HYDT_dmxu_poll_wait_for_event
> > (./tools/demux/demux_poll.c:77): callback returned error status
> > [proxy:0:0 at mas-nn-ib] main (./pm/pmiserv/pmip.c:226): demux engine error
> > waiting for event
> > [mpiexec at mas-nn-ib] HYDT_bscu_wait_for_completion
> > (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
> > badly; aborting
> > [mpiexec at mas-nn-ib] HYDT_bsci_wait_for_completion
> > (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
> for
> > completion
> > [mpiexec at mas-nn-ib] HYD_pmci_wait_for_completion
> > (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for
> > completion
> > [mpiexec at mas-nn-ib] main (./ui/mpich/mpiexec.c:405): process manager error
> > waiting for completion
> >
> >
> >
> > [root at mas-nn-ib ~]# /usr/local/other/mvapich2/bin/mpirun -n 2 -hosts
> > mas-nn-ib,mas-nn-ib /usr/local/other/benchmarks/osu_benchmarks/osu_bw
> > # OSU MPI Bandwidth Test v3.6
> > # Size Bandwidth (MB/s)
> > 1 2.57
> > 2 5.20
> > 4 10.40
> > 8 20.71
> > 16 41.34
> > 32 82.84
> > 64 164.62
> > 128 315.43
> > 256 586.19
> > 512 1010.73
> > 1024 1576.86
> > 2048 2350.19
> > 4096 3180.26
> > 8192 3839.28
> > 16384 4255.49
> > 32768 3043.95
> > 65536 3717.39
> > 131072 3869.20
> > 262144 3585.87
> > 524288 3563.36
> > 1048576 7079.52
> > 2097152 9921.37
> > 4194304 9929.68
> >
> >
> > [root at mas-nn-ib ~]# /usr/local/other/mvapich2/bin/mpirun -n 2 -hosts
> > mas-dn1-ib,mas-dn1-ib /usr/local/other/benchmarks/osu_benchmarks/osu_bw
> > # OSU MPI Bandwidth Test v3.6
> > # Size Bandwidth (MB/s)
> > 1 2.59
> > 2 5.22
> > 4 10.44
> > 8 20.85
> > 16 41.98
> > 32 82.87
> > 64 164.76
> > 128 320.40
> > 256 591.03
> > 512 996.87
> > 1024 1555.53
> > 2048 2336.40
> > 4096 3181.73
> > 8192 3840.08
> > 16384 4256.52
> > 32768 3220.42
> > 65536 3635.69
> > 131072 3855.15
> > 262144 3562.69
> > 524288 3564.40
> > 1048576 9877.30
> > 2097152 9901.89
> > 4194304 9920.58
> >
> >
>
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
> --
> Jonathan Perkins
> http://www.cse.ohio-state.edu/~perkinjo
>
>
>
--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
More information about the mvapich-discuss
mailing list