[mvapich-discuss] Re: confirm 43b1c24702cc8a5af97208dd4d163c4ca64ede44

Sayantan Sur surs at cse.ohio-state.edu
Thu May 6 16:05:43 EDT 2010


Hi Dan,

Thanks for your report. Could you help us narrow the problem?

1) Is your application written in C/C++/Fortran or in some other language?
2) Can you upgrade to the 1.4 bugfix branch:

http://mvapich.cse.ohio-state.edu/nightly/mvapich2/branches/1.4/

3) After upgrading to the bugfix branch, could you compile MVAPICH2
with --disable-registration-cache?

Thanks!

On Thu, May 6, 2010 at 3:26 PM, Dan Kokron <daniel.kokron at nasa.gov> wrote:
> I hope this gets attached to the proper thread.
>
> I wanted to say that I am seeing the exact same error as Battalgazi
> YILDIRIM when using mvapich2-1.4.1.  It runs fine up to 64 processes and
> dies with the following using more than 64.
>
> mpirun_rsh -hostfile /var/spool/PBS/aux/3671575.borgmg -np 96 ./GEOSgcm.x
> Fatal error in MPI_Init_thread:
> Other MPI error, error stack:
> MPIR_Init_thread(311)..: Initialization failed
> MPID_Init(191).........: channel initialization failed
> MPIDI_CH3_Init(156)....:
> MPIDI_CH3I_CM_Init(993): Error initializing MVAPICH2 MPIU_Malloc library
>
> I am able to run the application on more than 64 processes if mvapich2
> is compiled with the ch3:sock channel.
>
> ./configure CC=gcc CXX=g++ F77=ifort F90=ifort
> --prefix=/discover/nobackup/dkokron/mvapich2-1.4.1_debug_gcc_sock
> --enable-g=all --enable-f77 --enable-f90 --enable-cxx --enable-romio
> --with-device=ch3:sock
>
>
> Also fails when compiled with gen2 without rdma_cm
> ./configure CC=gcc CXX=g++ F77=ifort F90=ifort
> --prefix=/discover/nobackup/dkokron/mvapich2-1.4.1_debug_gcc_gen2-cm
> --enable-g=all --enable-f77 --enable-f90 --enable-cxx --enable-mpe
> --enable-romio --enable-threads=multiple --with-rdma=gen2
> --enable-rdma-cm=no
>
> I added some debug print to
> src/mpid/ch3/channels/mrail/src/memory/mem_hooks.c
> to see where in mvapich2_minit it is failing.  I'll report back with any
> news.
> --
> Dan Kokron
> Global Modeling and Assimilation Office
> NASA Goddard Space Flight Center
> Greenbelt, MD 20771
> Daniel.S.Kokron at nasa.gov
> Phone: (301) 614-5192
> Fax:   (301) 614-5304
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Sayantan Sur

Research Scientist
Department of Computer Science
The Ohio State University.



More information about the mvapich-discuss mailing list