[mvapich-discuss] MVAPICH2-1.6: Exit code -5

mausmi kotecha mausmikotecha at gmail.com
Fri Apr 1 08:36:10 EDT 2011


Hi,

I am trying to run HPC challenge benchmark on a large cluster. The cluster
size is ~7000 cores and I have been using MVAPCH2-1.6 along with Intel
compilers.
The runs for 4000 cores works fine, but going beyong 4k cores, mpispawn
startes the processes across the nodes but job exits giving following error:

MPI process (rank: 0) terminated unexpectedly on node01
Exit code -5 signaled from node01

I had compiled MVAPICH using:
CC=icc CXX=icc FC=ifort F90=ifort F77=ifort ./configure --prefix=$(PREFIX)
--enable-f77 --enable-f90 --enable-cxx --enable-xrc=yes --with-rdma=gen2
--with-cluster-size=large
make
make install
Where, $(PREFIX) is “/home/opt/MPI/mvapich2-1.6-intel”

The mpirun command I use is:
nohup mpirun_rsh -np 5000 -hostfile ./mach2 MV2_CPU_BINDING_POLICY=bunch
./hpcc

Stacksize set on my nodes is “unlimited”. Could you please let me know of
any run time flags to be used for for large clusters to resolve this issue
or if you know of any workaround for this?
Thanks in advnce.

Regards,
Mausmi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20110401/fce0f16a/attachment.html


More information about the mvapich-discuss mailing list