[mvapich-discuss] failure during Init of 3456 process job

Dan Kokron daniel.kokron at nasa.gov
Sat Mar 12 16:35:05 EST 2011


I am getting a failure during MPI_Init using mvapich2-1.6rc3 configured with

./configure CC=icc CXX=icpc F77=ifort F90=ifort CFLAGS=-fpic -DRDMA_CM
CXXFLAGS=-fpic -DRDMA_CM FFLAGS=-fpic F90FLAGS=-fpic
--prefix=/u/dkokron/play/mvapich2-1.6rc3/install.dbg --enable-f77
--enable-f90 --enable-cxx --enable-mpe --enable-romio
--with-file-system=lustre --enable-threads=default --with-rdma=gen2
--with-hwloc --enable-error-checking=all --enable-error-messages=all
--enable-g=all --enable-fast=none


mpirun_rsh -hostfile $PBS_NODEFILE -np 3456 GEOSgcm.x
Word too long.
child_handler: Error in init phase...wait for cleanup! (0/1mpispawn connections)
Failed in initilization phase, cleaned up all the mpispawn!

Seems others have seem this too since mvapich-1.4.
http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2009-November/002623.html

I tried setting MV2_FASTSSH_THRESHOLD, but that did not help this 288
node job.  Any ideas?

mpirun_rsh -hostfile /var/spool/pbs/aux/1576484.pbspl1.nas.nasa.gov -np
3456 MV2_FASTSSH_THRESHOLD=512 ./GEOSgcm.x
Word too long.
child_handler: Error in init phase...wait for cleanup! (0/1mpispawn
connections)



Thanks
Dan
-- 



More information about the mvapich-discuss mailing list