[mvapich-discuss] segment fault when ENABLE_AFFINITY?

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Aug 15 10:34:44 EDT 2012


Hi all, this problem was further debugged offline and we were able to
come up with a patch to resolve the problem.  It is currently available
in the 1.8 branch.  Tonight's nightly tarball will also contain this
fix.

On Tue, Aug 07, 2012 at 11:32:14PM +0800, M Xie wrote:
> Thank you.
> 
> I use --enable-fast in the configuration.
> I will use intel 12.1 compiler to compile a new MPI library.
> 
> I build a debug version today (using --enable-g=dbg) using intel 11.1,
> it still segment fault. The core dump is in the attachment.
> 
> 
> 2012/8/7 Jonathan Perkins <perkinjo at cse.ohio-state.edu>:
> > One of our developers pointed out that you may not of used the
> > --disable-fast option when you configured originally.  Can you make sure
> > to add this so that we can get line numbers and function arguments from
> > your next backtrace if you still encounter this issue?
> >
> > On Tue, Aug 07, 2012 at 09:33:45AM -0400, Jonathan Perkins wrote:
> >> Hello, can you try a fresh build?  Everything works fine for me.  I've
> >> built using intel 12.1 with limic2 support enabled.  I compiled with
> >> debug flags since it looks like that is what you may have used (based on
> >> the debug symbols from the backtrace).  This was run on a cluster local
> >> to OSU.
> >>
> >> [perkinjo at head osu-micro-benchmarks]$ srun -n 2 osu_bw
> >> # OSU MPI Bandwidth Test v3.6
> >> # Size      Bandwidth (MB/s)
> >> 1                       1.91
> >> 2                       3.78
> >> 4                       7.59
> >> 8                      15.41
> >> 16                     30.78
> >> 32                     59.47
> >> 64                    119.38
> >> 128                   239.74
> >> 256                   467.99
> >> 512                   888.25
> >> 1024                 1569.65
> >> 2048                 2968.76
> >> 4096                 4915.56
> >> 8192                 4490.64
> >> 16384                5678.56
> >> 32768                7658.10
> >> 65536                8942.07
> >> 131072               9605.61
> >> 262144               8787.58
> >> 524288               8656.53
> >> 1048576              8830.52
> >> 2097152              8437.81
> >> 4194304              7984.88
> >> [perkinjo at head osu-micro-benchmarks]$ ../../bin/mpiname -a
> >> MVAPICH2 1.8 Mon Apr 30 14:50:19 EDT 2012 ch3:mrail
> >>
> >> Compilation
> >> CC: icc    -g
> >> CXX: icpc   -g
> >> F77: ifort   -g
> >> FC: ifort   -g
> >>
> >> Configuration
> >> --prefix=/home/perkinjo/mvapich-discuss/slurm-hwloc-seg-fault/mvapich2-1.8/install --with-limic2 --with-pm=no --with-pmi=slurm CC=icc CXX=icpc F77=ifort FC=ifort --enable-g=dbg --disable-fast
> >>
> >> [perkinjo at head osu-micro-benchmarks]$ module show intel
> >> -------------------------------------------------------------------
> >> /etc/modulefiles/intel/latest:
> >>
> >> module-whatis    loads intel composer 12.1 compiler settings
> >> prepend-path     PATH /opt/intel/composer_xe_2011_sp1.9.293/bin
> >> prepend-path     PATH /opt/intel/composer_xe_2011_sp1.9.293/bin/intel64/
> >> prepend-path     MANPATH /opt/intel/composer_xe_2011_sp1.9.293/man
> >> prepend-path     LD_LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64
> >> -------------------------------------------------------------------
> >>
> >> [perkinjo at head osu-micro-benchmarks]$
> >>
> >> --
> >> Jonathan Perkins
> >> http://www.cse.ohio-state.edu/~perkinjo
> >>
> >
> > --
> > Jonathan Perkins
> > http://www.cse.ohio-state.edu/~perkinjo

> ed Hat Enterprise Linux (7.0.1-23.el5)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /tmp/osu_bw...(no debugging symbols found)...done.
> Reading symbols from /usr/lib64/libpmi.so.0...done.
> Loaded symbols for /usr/lib64/libpmi.so.0
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /usr/lib/liblimic2.so.0...done.
> Loaded symbols for /usr/lib/liblimic2.so.0
> Reading symbols from /usr/lib64/librdmacm.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/librdmacm.so.1
> Reading symbols from /usr/lib64/libibverbs.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libibverbs.so.1
> Reading symbols from /usr/lib64/libibumad.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libibumad.so.2
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libgcc_s.so.1
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /usr/lib64/libslurm.so.22...done.
> Loaded symbols for /usr/lib64/libslurm.so.22
> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libnss_files.so.2
> Reading symbols from /usr/lib64/slurm/auth_none.so...done.
> Loaded symbols for /usr/lib64/slurm/auth_none.so
> Reading symbols from /usr/lib64/libmthca-rdmav2.so...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libmthca-rdmav2.so
> Reading symbols from /usr/lib64/libmlx4-rdmav2.so...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libmlx4-rdmav2.so
> Core was generated by `./osu_bw'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x000000000045323f in _int_malloc (av=0x832c0, bytes=126) at
> /tmp/mvapich2-1.8/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4048
> 4048
> /tmp/mvapich2-1.8/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:
> No such file or directory.
>         in
> /tmp/mvapich2-1.8/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c
> (gdb) bt
> #0  0x000000000045323f in _int_malloc (av=0x832c0, bytes=126) at
> /tmp/mvapich2-1.8/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:4048
> #1  0x000000000045754a in calloc (n=537280, elem_size=126) at
> /tmp/mvapich2-1.8/src/mpid/ch3/channels/common/src/memory/ptmalloc2/mvapich_malloc.c:3764
> #2  0x00000000004207a8 in MPIDI_CH3I_SMP_init (pg=0x832c0) at
> ch3_smp_progress.c:1179
> #3  0x00000000004cca16 in MPIDI_CH3_Init (has_parent=537280, pg=0x7e,
> pg_rank=386218432) at ch3_init.c:325
> #4  0x000000000047eee6 in MPID_Init (argc=0x832c0, argv=0x7e,
> requested=386218432, provided=0x774eb0, has_args=0x774ed8, has_env=0x0) at
> mpid_init.c:294
> #5  0x0000000000413947 in MPIR_Init_thread (argc=0x832c0, argv=0x7e,
> required=386218432, provided=0x774eb0) at initthread.c:406
> #6  0x000000000041348b in PMPI_Init (argc=0x832c0, argv=0x7e) at init.c:155
> #7  0x000000000040e2c1 in main ()
> (gdb) 
> 


-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list