[mvapich-discuss] Fwd: MPI_Abort

Jonathan Perkins perkinjo at cse.ohio-state.edu
Thu Feb 2 13:34:37 EST 2012


Karen:
Thank you for your posting.  We are looking into this issue and will
get back to you with our findings.

On Thu, Feb 2, 2012 at 12:27 PM, Karen Tomko <ktomko at osc.edu> wrote:
> Hi All,
> While doing some testing on our new cluster, I've observed the MPI_Abort
> does not seem to be terminating all of the processes as expected. I
> originally observed this with an app that was crashing on some missing input
> files but would sit until walltime was exceeded in the batch script. I've
> tried the simple test case below on both oakley (the new cluster) and glenn
> (existing system). On glenn the test case terminates immediately. On oakley
> the test case sits until walltime exceeded. Both systems use mpiexec to
> launch jobs under Torque/MOAB.  Any idea why MPI_Abort is not terminating
> the processes as expected?
> -Karen
>
> [ktomko at oakley01 TEST]$ cat ~/MPI_Examples/Hello-abort.f
> c  Fortran example
>        program hello
>        include 'mpif.h'
>        integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
>
>        call MPI_INIT(ierror)
>        call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
>        call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
>        print*, 'proc ', rank, ' of ', size, ': Hello world'
>        if ((size .gt. 1) .and. (rank .eq. size-1)) then
>           call MPI_ABORT(MPI_COMM_WORLD, ierror)
>        endif
>        call MPI_BARRIER(MPI_COMM_WORLD)
>        call MPI_FINALIZE(ierror)
>        end
>
>  Mvapich version on Oakley is
> [ktomko at oakley01 TEST]$ mpiname -a
> MVAPICH2 1.7 unreleased development copy ch3:mrail
>
> Compilation
> CC: icc    -DNDEBUG -DNVALGRIND -O2
> CXX: icpc   -DNDEBUG -DNVALGRIND -O2
> F77: ifort   -O2
> FC: ifort   -O2
>
> Configuration
> --prefix=/usr/local/mvapich2/1.7-r5140-intel --enable-shared --with-mpe
> --enable-romio --with-file-system=ufs+nfs
>
> On Glenn it is:
> [ktomko at opt-login03 ~/MPI_Examples]$ mpiname -a
> MVAPICH2 1.6 2011-03-09 ch3:mrail
>
> Compilation
> CC: pgcc -noswitcherror -fPIC  -I/usr/local/pvfs2/include -g -DNDEBUG -O2
> CXX: pgCC -noswitcherror -fPIC  -g -DNDEBUG -O2
> F77: pgf77 -noswitcherror -fPIC  -g -DNDEBUG
> F90: pgf90 -noswitcherror -fPIC  -g -DNDEBUG
>
> Configuration
> --prefix=/usr/local/mpi/mvapich2-1.6-pgi --with-rdma=gen2 --with-pm=mpd
> --with-mpe --enable-debug --enable-g=dbg --enable-sharedlibs=gcc
> --enable-romio --with-file-system=ufs+nfs+pvfs2
>
> --
> Karen Tomko
> Ohio Supercomputer Center
> 614-292-1091
> ktomko at osc.edu
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list