[mvapich-discuss] MPI_init stall or hang up in Fortran flow solver

Jonathan Perkins perkinjo at cse.ohio-state.edu
Mon Dec 10 11:53:39 EST 2012


Have you tried running with the following?

    export GFORTRAN_UNBUFFERED_ALL=y

If you think that I/O has something to do with your issue and the other
compilers work it could have something to do with this.  Here is a
pointer to the 1.8.1 section of the userguide dealing with this.

http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.8.1.html#x1-1080009.1.6

On Fri, Dec 07, 2012 at 08:36:01PM -0800, Ryan Crocker wrote:
> I wish i could, right now the PG compilers are the only ones set up with 1.8.1 on the cluster.  I can get everything to work with intel and pg but i'd like to get gcc working to so i don't have to do mpi code tests, and set ups on our cluster.  I'm mostly curious 'why' i'm having this problem for future code development. 
> 
> On Dec 7, 2012, at 3:21 PM, Jonathan Perkins wrote:
> 
> > Thanks, that version is a bit old now.  Can you try the latest stable
> > release 1.8.1 or 1.9a2 and see if the problem is reproducible?
> > 
> > On Fri, Dec 07, 2012 at 02:16:44PM -0800, Ryan Crocker wrote:
> >> The version is  MVAPICH 1.6 compiled with GCC 4.7.0, the output of mpiname -a is:
> >> 
> >> -bash-3.2$ mpiname -a
> >> MVAPICH2 1.6 2011-03-09 ch3:mrail
> >> 
> >> Compilation
> >> CC: gcc -fpic -DNDEBUG -O2
> >> CXX: c++  -DNDEBUG -O2
> >> F77: gfortran -fpic -DNDEBUG -O2 
> >> F90: gfortran  -DNDEBUG -O2
> >> 
> >> Configuration
> >> --prefix=/nasa/mvapich2/1.6.sles11/gcc --enable-f77 --enable-f90 --enable-cxx --enable-romio --with-file-system=lustre+nfs --enable-threads=multiple --with-rdma=gen2 --with-pm=remshell
> >> 
> >> I should also add that the previous code i submitted is cleaned up a little bit, i do have a few print outs in main_init and parallel_init, i've added them below.  I didn't think that stdin/stdout for fortran would cause any problem, but it might.
> >> 
> >> -Ryan
> >> 
> >> ! ====================================== !
> >> subroutine main_init
> >>  use string
> >>  implicit none
> >>  character(len=str_medium) :: input_name
> >>  integer :: counter
> >> 
> >>  print*,'a1' 
> >>  ! Initialize parallel environment
> >>  call parallel_init
> >>  print*,'a2' 
> >> 
> >>  ! Initialize the random number generator
> >>  call random_init
> >>  print*,'a3' 
> >> 
> >>  ! Parse the command line
> >>  call parallel_get_inputname(input_name)
> >>  print*,'a4' 
> >> 
> >>  ! Parse the input file
> >>  call parser_init
> >>  print*,'a5' 
> >> 
> >>  call parser_parsefile(input_name)
> >>  print*,'a6' 
> >> 
> >>  ! Geometry initialization
> >>  call geometry_init
> >>  print*,'a7' 
> >> 
> >>  ! Data initialization
> >>  call data_init
> >>  print*,'a8' 
> >> 
> >>  call optdata_init
> >>  print*,'a9' 
> >> 
> >>  ! Simulation initialization
> >>  call simulation_init
> >> 
> >>  return
> >> end subroutine main_init
> >> 
> >> ! ====================================== !
> >> 
> >> subroutine parallel_init
> >>  use parallel
> >>  use parser
> >>  implicit none
> >>  integer :: ierr
> >>  integer :: size_real,size_dp
> >> 
> >>  print*,'b1_start'
> >>  ! Initialize a first basic MPI environment
> >>  call MPI_INIT(ierr)
> >>  print*,'bm1'
> >> 
> >>  call MPI_COMM_RANK(MPI_COMM_WORLD,irank,ierr)
> >>  print*,'bm2'
> >> 
> >>  call MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierr) 
> >>  print*,'bm3'
> >> 
> >>  irank = irank+1
> >>  iroot = 1
> >>   print*,'bm4'
> >> 
> >>  ! Set MPI working precision - WP
> >>  call MPI_TYPE_SIZE(MPI_REAL,size_real,ierr)
> >>  call MPI_TYPE_SIZE(MPI_DOUBLE_PRECISION,size_dp,ierr)
> >>  if (WP .eq. size_real) then
> >>     MPI_REAL_WP = MPI_REAL
> >>  else if (WP .eq. size_dp) then
> >>     MPI_REAL_WP = MPI_DOUBLE_PRECISION
> >>  else
> >>     call parallel_kill('Error in parallel_init: no WP equivalent in MPI')
> >>  end if
> >>  print*,'b3'
> >> 
> >>  ! Set MPI single precision
> >>  call MPI_TYPE_SIZE(MPI_REAL,size_real,ierr)
> >>  call MPI_TYPE_SIZE(MPI_DOUBLE_PRECISION,size_dp,ierr)
> >>  if (SP .eq. size_real) then
> >>     MPI_REAL_SP = MPI_REAL
> >>  else if (SP .eq. size_dp) then
> >>     MPI_REAL_SP = MPI_DOUBLE_PRECISION
> >>  else
> >>     call parallel_kill('Error in parallel_init: no SP equivalent in MPI')
> >>  end if
> >>  print*,'b4'
> >> 
> >>  ! For now, comm should point to MPI_COMM_WORLD
> >>  comm = MPI_COMM_WORLD
> >>  print*,'b5'
> >> 
> >>  return
> >> end subroutine parallel_init
> >> 
> >> ! ====================================== !
> >> 
> >> On Dec 7, 2012, at 1:52 PM, Jonathan Perkins wrote:
> >> 
> >>> Thanks for your note Ryan.  Can you give us some additional info such as
> >>> the version of MVAPICH2 used and the output of mpiname -a?
> >>> 
> >>> On Fri, Dec 07, 2012 at 12:40:12PM -0800, Ryan Crocker wrote:
> >>>> Hi all, 
> >>>> 
> >>>> I'm having a problem with hangups or some sort of stall.  Basically the program will keep running, or look as it if is on the cluster i'm running on (i've noticed this issue on multiple clusters actually), But nothing is happening.  When i have every processor print out they all do right before MPI_INIT and not after.  I counted the files and all the processors are entering the call.  What makes it even more odd is that the same exact simulation will run if i decrease the number of nodes/cores i'm using, i.e. i'll run on 96 cores (or more) and moving to 72 my program will run without issue.   
> >>>> 
> >>>> I'm using the latest gcc compiler set with the latest version of MPICH2 configured with:
> >>>> 
> >>>> '--with-pbs=/PBS' '--with-default-comm=pmi' '--enable-pbspro-helper' 'CC=gcc' 'LDFLAGS=-lpthread' 'CPPFLAGS=-fpic'
> >>>> 
> >>>> I've changed to the intel/mpich2 compiler set and have not had the same problem.  I have no idea what this issue could be and have had little luck finding an answer with stackoverflow or google searches any help would be much appreciated.  Also, hit me up for any more information you'd need to help.
> >>>> 
> >>>> Thanks,
> >>>> 
> >>>> My flow solver is written in fortran and here are the subroutines leading up to the MPI_INIT call, and the whole flow solver is compiled with '-O3 -ffree-line-length-none' :
> >>>> 
> >>>> ! ====================================== !
> >>>> program main
> >>>> 
> >>>> call main_init
> >>>> call simulation_run
> >>>> call main_stop
> >>>> 
> >>>> end program main
> >>>> 
> >>>> ! ====================================== !
> >>>> 
> >>>> subroutine main_init
> >>>> use string
> >>>> implicit none
> >>>> character(len=str_medium) :: input_name
> >>>> 
> >>>> call parallel_init
> >>>> 
> >>>> ! Initialize the random number generator
> >>>> call random_init
> >>>> 
> >>>> ! Parse the input file
> >>>> call parser_init
> >>>> 
> >>>> input_name='input' 
> >>>> call parser_parsefile(input_name)
> >>>> 
> >>>> ! Geometry initialization
> >>>> call geometry_init
> >>>> 
> >>>> ! Data initialization
> >>>> call data_init
> >>>> 
> >>>> ! Simulation initialization
> >>>> call simulation_init
> >>>> 
> >>>> return
> >>>> end subroutine main_init
> >>>> 
> >>>> ! ====================================== !
> >>>> 
> >>>> subroutine parallel_init
> >>>> use parallel
> >>>> use parser
> >>>> implicit none
> >>>> integer :: ierr
> >>>> integer :: size_real,size_dp
> >>>> 
> >>>> ! Initialize a first basic MPI environment
> >>>> 
> >>>> !##### This is where it stall out #########
> >>>> 
> >>>> call MPI_INIT(ierr)
> >>>> call MPI_COMM_RANK(MPI_COMM_WORLD,irank,ierr)
> >>>> call MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierr) 
> >>>> irank = irank+1
> >>>> iroot = 1
> >>>> 
> >>>> ! Set MPI working precision - WP
> >>>> call MPI_TYPE_SIZE(MPI_REAL,size_real,ierr)
> >>>> call MPI_TYPE_SIZE(MPI_DOUBLE_PRECISION,size_dp,ierr)
> >>>> if (WP .eq. size_real) then
> >>>>   MPI_REAL_WP = MPI_REAL
> >>>> else if (WP .eq. size_dp) then
> >>>>   MPI_REAL_WP = MPI_DOUBLE_PRECISION
> >>>> else
> >>>>   call parallel_kill('Error in parallel_init: no WP equivalent in MPI')
> >>>> end if
> >>>> 
> >>>> ! Set MPI single precision
> >>>> call MPI_TYPE_SIZE(MPI_REAL,size_real,ierr)
> >>>> call MPI_TYPE_SIZE(MPI_DOUBLE_PRECISION,size_dp,ierr)
> >>>> if (SP .eq. size_real) then
> >>>>   MPI_REAL_SP = MPI_REAL
> >>>> else if (SP .eq. size_dp) then
> >>>>   MPI_REAL_SP = MPI_DOUBLE_PRECISION
> >>>> else
> >>>>   call parallel_kill('Error in parallel_init: no SP equivalent in MPI')
> >>>> end if
> >>>> 
> >>>> ! For now, comm should point to MPI_COMM_WORLD
> >>>> comm = MPI_COMM_WORLD
> >>>> 
> >>>> return
> >>>> end subroutine parallel_init
> >>>> 
> >>>> ! ============================= !
> >>>> Ryan Crocker
> >>>> University of Vermont, School of Engineering
> >>>> Mechanical Engineering Department
> >>>> rcrocker at uvm.edu
> >>>> 315-212-7331
> >>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> mvapich-discuss mailing list
> >>>> mvapich-discuss at cse.ohio-state.edu
> >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>> 
> >>> 
> >>> -- 
> >>> Jonathan Perkins
> >>> http://www.cse.ohio-state.edu/~perkinjo
> >> 
> >> Ryan Crocker
> >> University of Vermont, School of Engineering
> >> Mechanical Engineering Department
> >> rcrocker at uvm.edu
> >> 315-212-7331
> >> 
> >> 
> > 
> > -- 
> > Jonathan Perkins
> > http://www.cse.ohio-state.edu/~perkinjo
> 
> Ryan Crocker
> University of Vermont, School of Engineering
> Mechanical Engineering Department
> rcrocker at uvm.edu
> 315-212-7331
> 
> 

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list