[mvapich-discuss] Re: confirm 43b1c24702cc8a5af97208dd4d163c4ca64ede44

Dan Kokron daniel.kokron at nasa.gov
Thu May 6 15:26:00 EDT 2010


I hope this gets attached to the proper thread.

I wanted to say that I am seeing the exact same error as Battalgazi
YILDIRIM when using mvapich2-1.4.1.  It runs fine up to 64 processes and
dies with the following using more than 64.

mpirun_rsh -hostfile /var/spool/PBS/aux/3671575.borgmg -np 96 ./GEOSgcm.x
Fatal error in MPI_Init_thread:
Other MPI error, error stack:
MPIR_Init_thread(311)..: Initialization failed
MPID_Init(191).........: channel initialization failed
MPIDI_CH3_Init(156)....: 
MPIDI_CH3I_CM_Init(993): Error initializing MVAPICH2 MPIU_Malloc library

I am able to run the application on more than 64 processes if mvapich2
is compiled with the ch3:sock channel.

./configure CC=gcc CXX=g++ F77=ifort F90=ifort
--prefix=/discover/nobackup/dkokron/mvapich2-1.4.1_debug_gcc_sock
--enable-g=all --enable-f77 --enable-f90 --enable-cxx --enable-romio
--with-device=ch3:sock


Also fails when compiled with gen2 without rdma_cm
./configure CC=gcc CXX=g++ F77=ifort F90=ifort
--prefix=/discover/nobackup/dkokron/mvapich2-1.4.1_debug_gcc_gen2-cm
--enable-g=all --enable-f77 --enable-f90 --enable-cxx --enable-mpe
--enable-romio --enable-threads=multiple --with-rdma=gen2
--enable-rdma-cm=no

I added some debug print to
src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c
to see where in mvapich2_minit it is failing.  I'll report back with any
news.
-- 
Dan Kokron
Global Modeling and Assimilation Office
NASA Goddard Space Flight Center
Greenbelt, MD 20771
Daniel.S.Kokron at nasa.gov
Phone: (301) 614-5192
Fax:   (301) 614-5304



More information about the mvapich-discuss mailing list