[mvapich-discuss] segmentation fault (signal 11), Exit code 139Ok
Ed Wahl
ewahl at osc.edu
Fri Aug 17 13:09:12 EDT 2012
>From one of my pbs batch scripts. Seems to work for me (but then I cannot get 1.7+ to give me a real stack dump, but backtraces are working fine):
export MV2_DEBUG_SHOW_BACKTRACE=1
for the dumps to work right (when they dump):
export MV2_DEBUG_CORESIZE=unlimited
Ed Wahl
OSC
________________________________________
From: mvapich-discuss-bounces at cse.ohio-state.edu [mvapich-discuss-bounces at cse.ohio-state.edu] on behalf of Hoot Thompson [hoot at ptpnow.com]
Sent: Friday, August 17, 2012 11:59 AM
To: 'Jonathan Perkins'
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: RE: [mvapich-discuss] segmentation fault (signal 11), Exit code 139Ok
Ok, so I recompiled with the debug flags but didn't get any additional error
info. Where do I look for the info? How do I invoke
MV2_DEBUG_SHOW_BACKTRACE?
-----Original Message-----
From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
Sent: Friday, August 17, 2012 11:14 AM
To: Hoot Thompson
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] segmentation fault (signal 11), Exit code 139
On Fri, Aug 17, 2012 at 10:33:55AM -0400, Hoot Thompson wrote:
> I have new cluster that I’ve configured in a manner similar to other
> systems. Getting the following error when running between nodes, works
fine
> when running on same node (either of two).
Can you tell us a little bit about the architecture of systems as well
as the software environment (such as OS and any schedulers in use).
I also suggest taking a look at
https://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.8.html#x1-1
120009.1.10
Try using the MV2_DEBUG_SHOW_BACKTRACE parameter to see if you get more
output. Also when doing your debug build, use --disable-fast in
addition to --enable-g=dbg.
>
> Hoot
>
>
> [root at mas-nn-ib ~]# /usr/local/other/mvapich2/bin/mpirun -n 2 -hosts
> mas-nn-ib,mas-dn1-ib /usr/local/other/benchmarks/osu_benchmarks/osu_bw
> [mas-dn01-ib:mpi_rank_1][error_sighandler] Caught error: Segmentation
fault
> (signal 11)
>
>
============================================================================
> =========
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = EXIT CODE: 139
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
============================================================================
> =========
> [proxy:0:0 at mas-nn-ib] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:955): assert (!closed) failed
> [proxy:0:0 at mas-nn-ib] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:0 at mas-nn-ib] main (./pm/pmiserv/pmip.c:226): demux engine error
> waiting for event
> [mpiexec at mas-nn-ib] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated
> badly; aborting
> [mpiexec at mas-nn-ib] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
for
> completion
> [mpiexec at mas-nn-ib] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for
> completion
> [mpiexec at mas-nn-ib] main (./ui/mpich/mpiexec.c:405): process manager error
> waiting for completion
>
>
>
> [root at mas-nn-ib ~]# /usr/local/other/mvapich2/bin/mpirun -n 2 -hosts
> mas-nn-ib,mas-nn-ib /usr/local/other/benchmarks/osu_benchmarks/osu_bw
> # OSU MPI Bandwidth Test v3.6
> # Size Bandwidth (MB/s)
> 1 2.57
> 2 5.20
> 4 10.40
> 8 20.71
> 16 41.34
> 32 82.84
> 64 164.62
> 128 315.43
> 256 586.19
> 512 1010.73
> 1024 1576.86
> 2048 2350.19
> 4096 3180.26
> 8192 3839.28
> 16384 4255.49
> 32768 3043.95
> 65536 3717.39
> 131072 3869.20
> 262144 3585.87
> 524288 3563.36
> 1048576 7079.52
> 2097152 9921.37
> 4194304 9929.68
>
>
> [root at mas-nn-ib ~]# /usr/local/other/mvapich2/bin/mpirun -n 2 -hosts
> mas-dn1-ib,mas-dn1-ib /usr/local/other/benchmarks/osu_benchmarks/osu_bw
> # OSU MPI Bandwidth Test v3.6
> # Size Bandwidth (MB/s)
> 1 2.59
> 2 5.22
> 4 10.44
> 8 20.85
> 16 41.98
> 32 82.87
> 64 164.76
> 128 320.40
> 256 591.03
> 512 996.87
> 1024 1555.53
> 2048 2336.40
> 4096 3181.73
> 8192 3840.08
> 16384 4256.52
> 32768 3220.42
> 65536 3635.69
> 131072 3855.15
> 262144 3562.69
> 524288 3564.40
> 1048576 9877.30
> 2097152 9901.89
> 4194304 9920.58
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list