[mvapich-discuss] MVAPICH2-1.6: Exit code -5

mausmi kotecha mausmikotecha at gmail.com
Sat Apr 2 09:22:06 EDT 2011


Hi Jonathan,
Thanks a lot for your reply. I tried to get the trace as you suggested.
Looks like it is a problem allocating memory to MPIRandomAccess routine.
Please find the trace as below:


Reading symbols from /usr/lib64/libcxgb3-rdmav2.so...Reading symbols from
/usr/lib/debug/usr/lib64/libcxgb3-rdmav2.so.debug...done.
done.
Loaded symbols for /usr/lib64/libcxgb3-rdmav2.so
Core was generated by `./hpcc'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002b1dcf3d98ce in _int_free () from
/home/opt/MPI/mvapich2-1.6-intel/lib/libmpich.so.1.2

(gdb) bt
#0  0x00002b1dcf3d98ce in _int_free () from
/home/opt/MPI/mvapich2-1.6-intel/lib/libmpich.so.1.2
#1  0x00002b1dcf3db69c in free () from
/home/opt/MPI/mvapich2-1.6-intel/lib/libmpich.so.1.2
#2  0x0000000000412d9c in AnyNodesMPIRandomAccessUpdate ()
#3  0x0000000000410b6e in HPCC_MPIRandomAccess ()
#4  0x0000000000403b4f in main ()

I am using Infiniband. Please let me know if you need additional details.

Regards,
Mausmi

On Fri, Apr 1, 2011 at 8:45 PM, Jonathan Perkins <
perkinjo at cse.ohio-state.edu> wrote:

> Mausmi:
> Hi, thanks for using mvapich2.  We'd like you to try a couple things
> to help isolate the error.
>
> Can you rebuild the library as such:
> $ ./configure --prefix=/home/opt/MPI/mvapich2-1.6-intel CC=icc CXX=icc
> FC=ifort F90=ifort F77=ifort
> $ make && make install
>
> Assuming you're using bash, can you put `ulimit -c unlimited' in your
> .bashrc.  This will allow a core dump to be generated if there is a
> segmentation fault somewhere.
>
> If the error happens again please look for files prefixed with `core'.
>  You can use gdb to get a backtrace with this file.
> $ gdb EXECUTABLE-FILE CORE-FILE
>
> Just replace EXECUTABLE-FILE with the application you're running and
> CORE-FILE with the core file you found.  You should drop into a gdb
> shell and you can run the command `bt' from here.
>
> Also, can you tell us if you're using Infiniband or iWARP?  Thanks in
> advance.
>
> On Fri, Apr 1, 2011 at 8:36 AM, mausmi kotecha <mausmikotecha at gmail.com>
> wrote:
> > Hi,
> >
> > I am trying to run HPC challenge benchmark on a large cluster. The
> cluster
> > size is ~7000 cores and I have been using MVAPCH2-1.6 along with Intel
> > compilers.
> > The runs for 4000 cores works fine, but going beyong 4k cores, mpispawn
> > startes the processes across the nodes but job exits giving following
> error:
> >
> > MPI process (rank: 0) terminated unexpectedly on node01
> > Exit code -5 signaled from node01
> >
> > I had compiled MVAPICH using:
> > CC=icc CXX=icc FC=ifort F90=ifort F77=ifort ./configure
> --prefix=$(PREFIX)
> > --enable-f77 --enable-f90 --enable-cxx --enable-xrc=yes --with-rdma=gen2
> > --with-cluster-size=large
> > make
> > make install
> > Where, $(PREFIX) is “/home/opt/MPI/mvapich2-1.6-intel”
> >
> > The mpirun command I use is:
> > nohup mpirun_rsh -np 5000 -hostfile ./mach2 MV2_CPU_BINDING_POLICY=bunch
> > ./hpcc
> >
> > Stacksize set on my nodes is “unlimited”. Could you please let me know of
> > any run time flags to be used for for large clusters to resolve this
> issue
> > or if you know of any workaround for this?
> > Thanks in advnce.
> >
> > Regards,
> > Mausmi
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
>
>
>
> --
> Jonathan Perkins
> http://www.cse.ohio-state.edu/~perkinjo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20110402/260c5806/attachment.html


More information about the mvapich-discuss mailing list