[mvapich-discuss] NaNs from non-blocking comms

Dan Kokron daniel.kokron at nasa.gov
Thu Apr 7 18:09:30 EDT 2011


Sayantan,

Hope the workshop talk went well.

Some more data-points.  mpich2-1.2.1p1 and the latest available MPICH2
(r8363) don't give NaNs when configured as follows.

http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/1.2.1p1/mpich2-1.2.1p1.tar.gz

./configure CC=icc CXX=icpc F77=ifort FC=ifort CFLAGS=-fpic
CXXFLAGS=-fpic FFLAGS=-fpic FCFLAGS=-fpic
--prefix=/discover/nobackup/projects/gmao/share/dkokron/play/MPICH2/mpich2-1.2.1p1/install --enable-f77 --enable-f90 --enable-cxx --enable-romio --enable-smpcoll --without-mpe

http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/trunk/mpich2-trunk-r8363.tar.gz

./configure CC=icc CXX=icpc F77=ifort FC=ifort CFLAGS=-fpic
CXXFLAGS=-fpic FFLAGS=-fpic FCFLAGS=-fpic --prefix=$PWD/install
--enable-f77 --enable-fc --enable-cxx --enable-romio --with-pm=hydra
--enable-smpcoll --without-mpe

Dan

On Tue, 2011-04-05 at 15:32 -0500, Sayantan Sur wrote:
> Hi Dan,
> 
> Thanks for the updated code. I will ask someone to run the code on our
> end to see if we can reproduce this. I am at the OpenFabrics workshop,
> and our talk is going to be held soon.
> 
> Thanks again.
> 
> On Tue, Apr 5, 2011 at 12:50 PM, Dan Kokron <daniel.kokron at nasa.gov> wrote:
> > updated code is attached.  I think I put the Wait in the proper place.
> > Still getting NaNs.
> >
> > mpiexec.hydra -prepend-rank -launcher-exec /usr/bin/sshmpi -np 72 ./a.out
> > [3]  NaN found           13          10         660
> > [69]  NaN found           13           9         588
> >
> > Dan
> >
> > On Tue, 2011-04-05 at 14:11 -0500, Sayantan Sur wrote:
> >> Hi Dan,
> >>
> >> Thanks for posting this example. I took a quick look at the example. I
> >> think there is a bug in the application code. The MPI standard
> >> requires that all non-blocking communications be (locally) completed
> >> before calling finalize. MPI_Barrier doesn't guarantee this. Let me
> >> know if you think I am mistaken.
> >>
> >> http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
> >>
> >> Page 291, line 36
> >>
> >> "This routine cleans up all MPI state. Each process must call
> >> MPI_FINALIZE before
> >> it exits. Unless there has been a call to MPI_ABORT, each process must
> >> ensure that all
> >> pending nonblocking communications are (locally) complete before
> >> calling MPI_FINALIZE."
> >>
> >> Can you try inserting MPI_Wait / MPI_Waitall in your example to see if
> >> this works?
> >>
> >> Thanks!
> >>
> >> On Tue, Apr 5, 2011 at 10:59 AM, Dan Kokron <daniel.kokron at nasa.gov> wrote:
> >> > Using mvapich2-1.6 configured and built under x86_64 Linux with
> >> >
> >> > Intel-11.0.083 suite of compilers
> >> >
> >> > ./configure CC=icc CXX=icpc F77=ifort F90=ifort CFLAGS=-fpic
> >> > CXXFLAGS=-fpic FFLAGS=-fpic F90FLAGS=-fpic
> >> > --prefix=/home/dkokron/play/mvapich2-1.6/install/intel --enable-f77
> >> > --enable-f90 --enable-cxx --enable-romio --with-hwloc
> >> >
> >> > The attached example code gives NaN's as output from the MPI_Recv if
> >> > MV2_ON_DEMAND_THRESHOLD is set to be less than the number of processes
> >> > used.
> >> >
> >> > The example also gives NaNs using IntelMPI-4.0.1.002 if
> >> > I_MPI_USE_DYNAMIC_CONNECTIONS=enable
> >> >
> >> > See the 'commands' file in the tarball for more information.
> >> > --
> >> > Dan Kokron
> >> > Global Modeling and Assimilation Office
> >> > NASA Goddard Space Flight Center
> >> > Greenbelt, MD 20771
> >> > Daniel.S.Kokron at nasa.gov
> >> > Phone: (301) 614-5192
> >> > Fax:   (301) 614-5304
> >> >
> >> > _______________________________________________
> >> > mvapich-discuss mailing list
> >> > mvapich-discuss at cse.ohio-state.edu
> >> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >> >
> >> >
> >>
> >>
> >>
> > --
> > Dan Kokron
> > Global Modeling and Assimilation Office
> > NASA Goddard Space Flight Center
> > Greenbelt, MD 20771
> > Daniel.S.Kokron at nasa.gov
> > Phone: (301) 614-5192
> > Fax:   (301) 614-5304
> >
> 
> 
> 
-- 
Dan Kokron
Global Modeling and Assimilation Office
NASA Goddard Space Flight Center
Greenbelt, MD 20771
Daniel.S.Kokron at nasa.gov
Phone: (301) 614-5192
Fax:   (301) 614-5304



More information about the mvapich-discuss mailing list