[mvapich-discuss] MVAPICH2-1.6: Exit code -5

Jonathan Perkins perkinjo at cse.ohio-state.edu
Fri Apr 1 11:15:35 EDT 2011


Mausmi:
Hi, thanks for using mvapich2.  We'd like you to try a couple things
to help isolate the error.

Can you rebuild the library as such:
$ ./configure --prefix=/home/opt/MPI/mvapich2-1.6-intel CC=icc CXX=icc
FC=ifort F90=ifort F77=ifort
$ make && make install

Assuming you're using bash, can you put `ulimit -c unlimited' in your
.bashrc.  This will allow a core dump to be generated if there is a
segmentation fault somewhere.

If the error happens again please look for files prefixed with `core'.
 You can use gdb to get a backtrace with this file.
$ gdb EXECUTABLE-FILE CORE-FILE

Just replace EXECUTABLE-FILE with the application you're running and
CORE-FILE with the core file you found.  You should drop into a gdb
shell and you can run the command `bt' from here.

Also, can you tell us if you're using Infiniband or iWARP?  Thanks in advance.

On Fri, Apr 1, 2011 at 8:36 AM, mausmi kotecha <mausmikotecha at gmail.com> wrote:
> Hi,
>
> I am trying to run HPC challenge benchmark on a large cluster. The cluster
> size is ~7000 cores and I have been using MVAPCH2-1.6 along with Intel
> compilers.
> The runs for 4000 cores works fine, but going beyong 4k cores, mpispawn
> startes the processes across the nodes but job exits giving following error:
>
> MPI process (rank: 0) terminated unexpectedly on node01
> Exit code -5 signaled from node01
>
> I had compiled MVAPICH using:
> CC=icc CXX=icc FC=ifort F90=ifort F77=ifort ./configure --prefix=$(PREFIX)
> --enable-f77 --enable-f90 --enable-cxx --enable-xrc=yes --with-rdma=gen2
> --with-cluster-size=large
> make
> make install
> Where, $(PREFIX) is “/home/opt/MPI/mvapich2-1.6-intel”
>
> The mpirun command I use is:
> nohup mpirun_rsh -np 5000 -hostfile ./mach2 MV2_CPU_BINDING_POLICY=bunch
> ./hpcc
>
> Stacksize set on my nodes is “unlimited”. Could you please let me know of
> any run time flags to be used for for large clusters to resolve this issue
> or if you know of any workaround for this?
> Thanks in advnce.
>
> Regards,
> Mausmi
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list