[mvapich-discuss] SIGSEV in F90: An MPI bug?
Brian Curtis
curtisbr at cse.ohio-state.edu
Thu Jan 31 12:36:01 EST 2008
David,
The MPI-2 documentation goes into great detail on issues with
Fortran-90 bindings (http://www.mpi-forum.org/docs/mpi-20-html/
node236.htm#Node236). The conditions you are seeing should be
directed to Intel.
Brian
On Jan 31, 2008, at 11:59 AM, David Stuebe wrote:
>
> Hi again Brian
>
> I just ran my test code on our cluster using ifort 10.1.011 and
> MVAPICH 1.0.1, but the behavior is still the same.
>
> Have you had a chance to try it on any of your test machines?
>
> David
>
>
>
>
> On Jan 25, 2008 12:31 PM, Brian Curtis <curtisbr at cse.ohio-
> state.edu> wrote:
> David,
>
> I did some research on this issue and it looks like you have posted
> the
> bug with Intel. Please let us know what you find out.
>
>
> Brian
>
> David Stuebe wrote:
> > Hi Brian
> >
> > I downloaded the public release, it seems silly but I am not sure
> how to get
> > a rev number from the source... there does not seem to be a '-
> version'
> > option that gives more info, although I did not look to hard.
> >
> > I have not tried MVAPICH 1.0.1, but once I have intel ifort 10 on
> the
> > cluster I will try 1.0.1 and see if it goes away.
> >
> > In the mean time please let me know if you can recreate the problem?
> >
> > David
> >
> > PS - Just want to make sure you understand my issue, I think it
> is a bad
> > idea to try and pass a non-contiguous F90 memory pointer, I
> should not do
> > that... but the way that it breaks has caused me headaches for
> weeks now. If
> > it reliably caused a sigsev on entering MPI_BCAST that would be
> great! As it
> > is it is really hard to trace the problem.
> >
> >
> >
> >
> > On Jan 23, 2008 3:23 PM, Brian Curtis <curtisbr at cse.ohio-
> state.edu> wrote:
> >
> >
> >> David,
> >>
> >> Sorry to hear you are experience problems with the MVAPICH2
> Fortran 90
> >> interface. I will be investigating this issue, but need some
> additional
> >> information about your setup. What is the exact version of
> MVAPICH2 1.0
> >> you are utilizing (daily tarball or release)? Have you tried
> MVAPICH2
> >> 1.0.1?
> >>
> >> Brian
> >>
> >> David Stuebe wrote:
> >>
> >>> Hello MVAPICH
> >>> I have found a strange bug in MVAPICH2 using IFORT. The
> behavior is very
> >>> strange indeed - it seems to be related to how ifort deals with
> passing
> >>> pointers to the MVAPICH FORTRAN 90 INTERFACE.
> >>> The MPI call returns successfully, but later calls to a dummy
> subroutine
> >>> cause a sigsev.
> >>>
> >>> Please look at the following code:
> >>>
> >>>
> >>>
> >> !
> ======================================================================
> ===========
> >>
> >> !
> ======================================================================
> ===========
> >>
> >> !
> ======================================================================
> ===========
> >>
> >>> ! TEST CODE TO FOR POSSIBLE BUG IN MVAPICH2 COMPILED ON IFORT
> >>> ! WRITEN BY: DAVID STUEBE
> >>> ! DATE: JAN 23, 2008
> >>> !
> >>> ! COMPILE WITH: mpif90 -xP mpi_prog.f90 -o xtest
> >>> !
> >>> ! KNOWN BEHAVIOR:
> >>> ! PASSING A NONE CONTIGUOUS POINTER TO MPI_BCAST CAUSES FAILURE OF
> >>> ! SUBROUTINES USING MULTI DIMENSIONAL EXPLICT SHAPE ARRAYS
> WITHOUT AN
> >>> INTERFACE -
> >>> ! EVEN THOUGH THE MPI_BCAST COMPLETES SUCCESUFULLY, RETURNING
> VALID
> >>>
> >> DATA.
> >>
> >>> !
> >>> ! COMMENTS:
> >>> ! I REALIZE PASSING NON CONTIGUOUS POINTERS IS DANGEROUS -
> SHAME ON
> >>> ! ME FOR MAKING THAT MISTAKE. HOWEVER, IT SHOULD EITHER WORK OR
> NOT.
> >>> ! RETURNING SUCCESSFULLY BUT CAUSING INTERFACE ERRORS LATER IS
> >>> ! EXTREMELY DIFFICULT TO DEBUG!
> >>> !
> >>> ! CONDITIONS FOR OCCURANCE:
> >>> ! COMPILER MUST OPTIMIZE USING 'VECTORIZATION'
> >>> ! ARRAY MUST BE 'LARGE' -SYSTEM DEPENDENT ?
> >>> ! MUST BE RUN ON MORE THAN ONE NODE TO CAUSE CRASH...
> >>> ! ie Running inside one SMP box does not crash.
> >>> !
> >>> ! RUNNING UNDER MPD, ALL PROCESSES SIGSEV
> >>> ! RUNNING UNDER MPIEXEC0.82 FOR PBS,
> >>> ! ONLY SOME PROCESSES SIGSEV ?
> >>> !
> >>> ! ENVIRONMENTAL INFO:
> >>> ! NODES: DELL 1850 3.0GHZ, 2GB RAM, INFINIBAND PCI-EX 4X
> >>> ! SYSTEM: ROCKS 4.2
> >>> ! gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
> >>> !
> >>> ! IFORT/ICC:
> >>> ! Intel(R) Fortran Compiler for Intel(R) EM64T-based
> applications,
> >>> ! Version 9.1 Build 20061101 Package ID: l_fc_c_9.1.040
> >>> !
> >>> ! MVAPICH2: mpif90 for mvapich2-1.0
> >>> ! ./configure --prefix=/usr/local/share/mvapich2/1.0
> >>> --with-device=osu_ch3:mrail --with-rdma=vapi --with-pm=mpd --
> enable-f90
> >>> --enable-cxx --disable-romio --without-mpe
> >>> !
> >>>
> >>>
> >> !
> ======================================================================
> ===========
> >>
> >> !
> ======================================================================
> ===========
> >>
> >> !
> ======================================================================
> ===========
> >>
> >>> Module vars
> >>> USE MPI
> >>> implicit none
> >>>
> >>>
> >>> integer :: n,m,MYID,NPROCS
> >>> integer :: ipt
> >>>
> >>> integer, allocatable, target :: data(:,:)
> >>>
> >>> contains
> >>>
> >>> subroutine alloc_vars
> >>> implicit none
> >>>
> >>> integer Status
> >>>
> >>> allocate(data(n,m),stat=status)
> >>> if (status /=0) then
> >>> write(ipt,*) "allocation error"
> >>> stop
> >>> end if
> >>>
> >>> data = 0
> >>>
> >>> end subroutine alloc_vars
> >>>
> >>> SUBROUTINE INIT_MPI_ENV(ID,NP)
> >>>
> >>>
> >> !
> ======================================================================
> =============|
> >>
> >>> ! INITIALIZE MPI
> >>>
> ENVIRONMENT |
> >>>
> >>>
> >> !
> ======================================================================
> =============|
> >>
> >>> INTEGER, INTENT(OUT) :: ID,NP
> >>> INTEGER IERR
> >>>
> >>> IERR=0
> >>>
> >>> CALL MPI_INIT(IERR)
> >>> IF(IERR/=0) WRITE(*,*) "BAD MPI_INIT", ID
> >>> CALL MPI_COMM_RANK(MPI_COMM_WORLD,ID,IERR)
> >>> IF(IERR/=0) WRITE(*,*) "BAD MPI_COMM_RANK", ID
> >>> CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NP,IERR)
> >>> IF(IERR/=0) WRITE(*,*) "BAD MPI_COMM_SIZE", ID
> >>>
> >>> END SUBROUTINE INIT_MPI_ENV
> >>>
> >>>
> >>>
> >>>
> >> !
> ======================================================================
> ========|
> >>
> >>> SUBROUTINE PSHUTDOWN
> >>>
> >>>
> >>>
> >> !
> ======================================================================
> ========|
> >>
> >>> INTEGER IERR
> >>>
> >>> IERR=0
> >>> CALL MPI_FINALIZE(IERR)
> >>> if(ierr /=0) write(ipt,*) "BAD MPI_FINALIZE", MYID
> >>> close(IPT)
> >>> STOP
> >>>
> >>> END SUBROUTINE PSHUTDOWN
> >>>
> >>>
> >>> SUBROUTINE CONTIGUOUS_WORKS
> >>> IMPLICIT NONE
> >>> INTEGER, pointer :: ptest(:,:)
> >>> INTEGER :: IERR, I,J
> >>>
> >>>
> >>> write(ipt,*) "START CONTIGUOUS:"
> >>> n=2000 ! Set size here...
> >>> m=n+10
> >>>
> >>> call alloc_vars
> >>> write(ipt,*) "ALLOCATED DATA"
> >>> ptest => data(1:N,1:N)
> >>>
> >>> IF (MYID == 0) ptest=6
> >>> write(ipt,*) "Made POINTER"
> >>>
> >>> call MPI_BCAST(ptest,N*N,MPI_INTEGER,0,MPI_COMM_WORLD,IERR)
> >>> IF(IERR /= 0) WRITE(IPT,*) "BAD BCAST", MYID
> >>>
> >>> write(ipt,*) "BROADCAST Data; a value:",data(1,6)
> >>>
> >>> DO I = 1,N
> >>> DO J = 1,N
> >>> if(data(I,J) /= 6) &
> >>> & write(ipt,*) "INCORRECT VALUE!", I,J,data(I,J)
> >>> END DO
> >>>
> >>> DO J = N+1,M
> >>> if(data(I,J) /= 0) &
> >>> & write(ipt,*) "INCORRECT VALUE!", I,J,data(I,J)
> >>> END DO
> >>>
> >>> END DO
> >>>
> >>> ! CALL THREE DIFFERENT EXAMPLES OF SUBROUTINES W/OUT AN
> ITERFACE
> >>> ! THAT USE AN EXPLICIT SHAPE ARRAY
> >>> write(ipt,*) "CALLING DUMMY1"
> >>> CALL DUMMY1
> >>>
> >>> write(ipt,*) "CALLING DUMMY2"
> >>> call Dummy2(m,n)
> >>>
> >>> write(ipt,*) "CALLING DUMMY3"
> >>> call Dummy3
> >>> write(ipt,*) "FINISHED!"
> >>>
> >>> END SUBROUTINE CONTIGUOUS_WORKS
> >>>
> >>> SUBROUTINE NON_CONTIGUOUS_FAILS
> >>> IMPLICIT NONE
> >>> INTEGER, pointer :: ptest(:,:)
> >>> INTEGER :: IERR, I,J
> >>>
> >>>
> >>> write(ipt,*) "START NON_CONTIGUOUS:"
> >>>
> >>> m=200 ! Set size here - crash is size dependent!
> >>> n=m+10
> >>>
> >>> call alloc_vars
> >>> write(ipt,*) "ALLOCATED DATA"
> >>> ptest => data(1:M,1:M)
> >>>
> >>> !===================================================
> >>> ! IF YOU CALL DUMMY2 HERE TOO, THEN EVERYTHING PASSES ???
> >>> !===================================================
> >>> ! CALL DUMMY1 ! THIS ONE HAS NO EFFECT
> >>> ! CALL DUMMY2 ! THIS ONE 'FIXES' THE BUG
> >>>
> >>> IF (MYID == 0) ptest=6
> >>> write(ipt,*) "Made POINTER"
> >>>
> >>> call MPI_BCAST(ptest,M*M,MPI_INTEGER,0,MPI_COMM_WORLD,IERR)
> >>> IF(IERR /= 0) WRITE(IPT,*) "BAD BCAST"
> >>>
> >>> write(ipt,*) "BROADCAST Data; a value:",data(1,6)
> >>>
> >>> DO I = 1,M
> >>> DO J = 1,M
> >>> if(data(J,I) /= 6) &
> >>> & write(ipt,*) "INCORRECT VALUE!",I,J,DATA(I,J)
> >>> END DO
> >>>
> >>> DO J = M+1,N
> >>> if(data(J,I) /= 0) &
> >>> & write(ipt,*) "INCORRECT VALUE!",I,J,DATA(I,J)
> >>> END DO
> >>> END DO
> >>>
> >>> ! CALL THREE DIFFERENT EXAMPLES OF SUBROUTINES W/OUT AN
> ITERFACE
> >>> ! THAT USE AN EXPLICIT SHAPE ARRAY
> >>> write(ipt,*) "CALLING DUMMY1"
> >>> CALL DUMMY1
> >>>
> >>> write(ipt,*) "CALLING DUMMY2"
> >>> call Dummy2(m,n) ! SHOULD CRASH HERE!
> >>>
> >>> write(ipt,*) "CALLING DUMMY3"
> >>> call Dummy3
> >>> write(ipt,*) "FINISHED!"
> >>>
> >>> END SUBROUTINE NON_CONTIGUOUS_FAILS
> >>>
> >>>
> >>> End Module vars
> >>>
> >>>
> >>> Program main
> >>> USE vars
> >>> implicit none
> >>>
> >>>
> >>> CALL INIT_MPI_ENV(MYID,NPROCS)
> >>>
> >>> ipt=myid+10
> >>> OPEN(ipt)
> >>>
> >>>
> >>> write(ipt,*) "Start memory test!"
> >>>
> >>> CALL NON_CONTIGUOUS_FAILS
> >>>
> >>> ! CALL CONTIGUOUS_WORKS
> >>>
> >>> write(ipt,*) "End memory test!"
> >>>
> >>> CALL PSHUTDOWN
> >>>
> >>> END Program main
> >>>
> >>>
> >>>
> >>> ! TWO DUMMY SUBROUTINE WITH EXPLICIT SHAPE ARRAYS
> >>> ! DUMMY1 DECLARES A VECTOR - THIS ONE NEVER CAUSES FAILURE
> >>> ! DUMMY2 DECLARES AN ARRAY - THIS ONE CAUSES FAILURE
> >>>
> >>> SUBROUTINE DUMMY1
> >>> USE vars
> >>> implicit none
> >>> real, dimension(m) :: my_data
> >>>
> >>> write(ipt,*) "m,n",m,n
> >>>
> >>> write(ipt,*) "DUMMY 1", size(my_data)
> >>>
> >>> END SUBROUTINE DUMMY1
> >>>
> >>>
> >>> SUBROUTINE DUMMY2(i,j)
> >>> USE vars
> >>> implicit none
> >>> INTEGER, INTENT(IN) ::i,j
> >>>
> >>>
> >>> real, dimension(i,j) :: my_data
> >>>
> >>> write(ipt,*) "start: DUMMY 2", size(my_data)
> >>>
> >>>
> >>> END SUBROUTINE DUMMY2
> >>>
> >>> SUBROUTINE DUMMY3
> >>> USE vars
> >>> implicit none
> >>>
> >>>
> >>> real, dimension(m,n) :: my_data
> >>>
> >>>
> >>> write(ipt,*) "start: DUMMY 3", size(my_data)
> >>>
> >>>
> >>> END SUBROUTINE DUMMY3
> >>>
> >>>
> >>>
> ----------------------------------------------------------------------
> --
> >>>
> >>> _______________________________________________
> >>> mvapich-discuss mailing list
> >>> mvapich-discuss at cse.ohio-state.edu
> >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>
> >>>
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080131/76b5f701/attachment-0001.html
More information about the mvapich-discuss
mailing list