[mvapich-discuss] SIGSEV in F90: An MPI bug?

Brian Curtis curtisbr at cse.ohio-state.edu
Fri Jan 25 12:31:28 EST 2008


David,

I did some research on this issue and it looks like you have posted the 
bug with Intel.  Please let us know what you find out.


Brian

David Stuebe wrote:
> Hi Brian
>
> I downloaded the public release, it seems silly but I am not sure how to get
> a rev number from the source... there does not seem to be a '-version'
> option that gives more info, although I did not look to hard.
>
> I have not tried MVAPICH 1.0.1, but once I have intel ifort 10 on the
> cluster I will try 1.0.1 and see if it goes away.
>
> In the mean time please let me know if you can recreate the problem?
>
> David
>
> PS - Just want to make sure you understand my issue, I think it is a bad
> idea to try and pass a non-contiguous F90 memory pointer, I should not do
> that... but the way that it breaks has caused me headaches for weeks now. If
> it reliably caused a sigsev on entering MPI_BCAST that would be great! As it
> is it is really hard to trace the problem.
>
>
>
>
> On Jan 23, 2008 3:23 PM, Brian Curtis <curtisbr at cse.ohio-state.edu> wrote:
>
>   
>> David,
>>
>> Sorry to hear you are experience problems with the MVAPICH2 Fortran 90
>> interface.  I will be investigating this issue, but need some additional
>> information about your setup.  What is the exact version of MVAPICH2 1.0
>> you are utilizing (daily tarball or release)?  Have you tried MVAPICH2
>> 1.0.1?
>>
>> Brian
>>
>> David Stuebe wrote:
>>     
>>> Hello MVAPICH
>>> I have found a strange bug in MVAPICH2 using IFORT. The behavior is very
>>> strange indeed - it seems to be related to how ifort deals with passing
>>> pointers to the MVAPICH FORTRAN 90 INTERFACE.
>>> The MPI call returns successfully, but later calls to a dummy subroutine
>>> cause a sigsev.
>>>
>>>  Please look at the following code:
>>>
>>>
>>>       
>> !=================================================================================
>>     
>> !=================================================================================
>>     
>> !=================================================================================
>>     
>>> ! TEST CODE TO FOR POSSIBLE BUG IN MVAPICH2 COMPILED ON IFORT
>>> ! WRITEN BY: DAVID STUEBE
>>> ! DATE: JAN 23, 2008
>>> !
>>> ! COMPILE WITH: mpif90 -xP mpi_prog.f90 -o xtest
>>> !
>>> ! KNOWN BEHAVIOR:
>>> ! PASSING A NONE CONTIGUOUS POINTER TO MPI_BCAST CAUSES FAILURE OF
>>> ! SUBROUTINES USING MULTI DIMENSIONAL EXPLICT SHAPE ARRAYS WITHOUT AN
>>> INTERFACE -
>>> ! EVEN THOUGH THE MPI_BCAST COMPLETES SUCCESUFULLY, RETURNING VALID
>>>       
>> DATA.
>>     
>>> !
>>> ! COMMENTS:
>>> ! I REALIZE PASSING NON CONTIGUOUS POINTERS IS DANGEROUS - SHAME ON
>>> ! ME FOR MAKING THAT MISTAKE. HOWEVER, IT SHOULD EITHER WORK OR NOT.
>>> ! RETURNING SUCCESSFULLY BUT CAUSING INTERFACE ERRORS LATER IS
>>> ! EXTREMELY DIFFICULT TO DEBUG!
>>> !
>>> ! CONDITIONS FOR OCCURANCE:
>>> !    COMPILER MUST OPTIMIZE USING 'VECTORIZATION'
>>> !    ARRAY MUST BE 'LARGE' -SYSTEM DEPENDENT ?
>>> !    MUST BE RUN ON MORE THAN ONE NODE TO CAUSE CRASH...
>>> !    ie  Running inside one SMP box does not crash.
>>> !
>>> !    RUNNING UNDER MPD, ALL PROCESSES SIGSEV
>>> !    RUNNING UNDER MPIEXEC0.82 FOR PBS,
>>> !       ONLY SOME PROCESSES SIGSEV ?
>>> !
>>> ! ENVIRONMENTAL INFO:
>>> ! NODES: DELL 1850 3.0GHZ, 2GB RAM, INFINIBAND PCI-EX 4X
>>> ! SYSTEM: ROCKS 4.2
>>> ! gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
>>> !
>>> ! IFORT/ICC:
>>> !   Intel(R) Fortran Compiler for Intel(R) EM64T-based applications,
>>> !   Version 9.1 Build 20061101 Package ID: l_fc_c_9.1.040
>>> !
>>> ! MVAPICH2: mpif90 for mvapich2-1.0
>>> ! ./configure --prefix=/usr/local/share/mvapich2/1.0
>>> --with-device=osu_ch3:mrail --with-rdma=vapi --with-pm=mpd --enable-f90
>>> --enable-cxx --disable-romio --without-mpe
>>> !
>>>
>>>       
>> !=================================================================================
>>     
>> !=================================================================================
>>     
>> !=================================================================================
>>     
>>> Module vars
>>>   USE MPI
>>>   implicit none
>>>
>>>
>>>   integer :: n,m,MYID,NPROCS
>>>   integer :: ipt
>>>
>>>   integer, allocatable, target :: data(:,:)
>>>
>>>   contains
>>>
>>>     subroutine alloc_vars
>>>       implicit none
>>>
>>>       integer Status
>>>
>>>       allocate(data(n,m),stat=status)
>>>       if (status /=0) then
>>>          write(ipt,*) "allocation error"
>>>          stop
>>>       end if
>>>
>>>       data = 0
>>>
>>>     end subroutine alloc_vars
>>>
>>>    SUBROUTINE INIT_MPI_ENV(ID,NP)
>>>
>>>       
>> !===================================================================================|
>>     
>>> !  INITIALIZE MPI
>>> ENVIRONMENT                                                       |
>>>
>>>       
>> !===================================================================================|
>>     
>>>      INTEGER, INTENT(OUT) :: ID,NP
>>>      INTEGER IERR
>>>
>>>      IERR=0
>>>
>>>      CALL MPI_INIT(IERR)
>>>      IF(IERR/=0) WRITE(*,*) "BAD MPI_INIT", ID
>>>      CALL MPI_COMM_RANK(MPI_COMM_WORLD,ID,IERR)
>>>      IF(IERR/=0) WRITE(*,*) "BAD MPI_COMM_RANK", ID
>>>      CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NP,IERR)
>>>      IF(IERR/=0) WRITE(*,*) "BAD MPI_COMM_SIZE", ID
>>>
>>>    END SUBROUTINE INIT_MPI_ENV
>>>
>>>
>>>
>>>       
>> !==============================================================================|
>>     
>>>   SUBROUTINE PSHUTDOWN
>>>
>>>
>>>       
>> !==============================================================================|
>>     
>>>     INTEGER IERR
>>>
>>>     IERR=0
>>>     CALL MPI_FINALIZE(IERR)
>>>     if(ierr /=0) write(ipt,*) "BAD MPI_FINALIZE", MYID
>>>     close(IPT)
>>>     STOP
>>>
>>>   END SUBROUTINE PSHUTDOWN
>>>
>>>
>>>   SUBROUTINE CONTIGUOUS_WORKS
>>>     IMPLICIT NONE
>>>     INTEGER, pointer :: ptest(:,:)
>>>     INTEGER :: IERR, I,J
>>>
>>>
>>>     write(ipt,*) "START CONTIGUOUS:"
>>>     n=2000 ! Set size here...
>>>     m=n+10
>>>
>>>     call alloc_vars
>>>     write(ipt,*) "ALLOCATED DATA"
>>>     ptest => data(1:N,1:N)
>>>
>>>     IF (MYID == 0) ptest=6
>>>     write(ipt,*) "Made POINTER"
>>>
>>>     call MPI_BCAST(ptest,N*N,MPI_INTEGER,0,MPI_COMM_WORLD,IERR)
>>>     IF(IERR /= 0) WRITE(IPT,*) "BAD BCAST", MYID
>>>
>>>     write(ipt,*) "BROADCAST Data; a value:",data(1,6)
>>>
>>>     DO I = 1,N
>>>        DO J = 1,N
>>>           if(data(I,J) /= 6) &
>>>                & write(ipt,*) "INCORRECT VALUE!", I,J,data(I,J)
>>>        END DO
>>>
>>>        DO J = N+1,M
>>>           if(data(I,J) /= 0) &
>>>                & write(ipt,*) "INCORRECT VALUE!", I,J,data(I,J)
>>>        END DO
>>>
>>>     END DO
>>>
>>>     ! CALL THREE DIFFERENT EXAMPLES OF SUBROUTINES W/OUT AN ITERFACE
>>>     ! THAT USE AN EXPLICIT SHAPE ARRAY
>>>     write(ipt,*) "CALLING DUMMY1"
>>>     CALL DUMMY1
>>>
>>>     write(ipt,*) "CALLING DUMMY2"
>>>     call Dummy2(m,n)
>>>
>>>     write(ipt,*) "CALLING DUMMY3"
>>>     call Dummy3
>>>     write(ipt,*) "FINISHED!"
>>>
>>>   END SUBROUTINE CONTIGUOUS_WORKS
>>>
>>>   SUBROUTINE NON_CONTIGUOUS_FAILS
>>>     IMPLICIT NONE
>>>     INTEGER, pointer :: ptest(:,:)
>>>     INTEGER :: IERR, I,J
>>>
>>>
>>>     write(ipt,*) "START NON_CONTIGUOUS:"
>>>
>>>     m=200 ! Set size here - crash is size dependent!
>>>     n=m+10
>>>
>>>     call alloc_vars
>>>     write(ipt,*) "ALLOCATED DATA"
>>>     ptest => data(1:M,1:M)
>>>
>>> !===================================================
>>> ! IF YOU CALL DUMMY2 HERE TOO, THEN EVERYTHING PASSES  ???
>>> !===================================================
>>> !    CALL DUMMY1 ! THIS ONE HAS NO EFFECT
>>> !    CALL DUMMY2 ! THIS ONE 'FIXES' THE BUG
>>>
>>>     IF (MYID == 0) ptest=6
>>>     write(ipt,*) "Made POINTER"
>>>
>>>     call MPI_BCAST(ptest,M*M,MPI_INTEGER,0,MPI_COMM_WORLD,IERR)
>>>     IF(IERR /= 0) WRITE(IPT,*) "BAD BCAST"
>>>
>>>     write(ipt,*) "BROADCAST Data; a value:",data(1,6)
>>>
>>>     DO I = 1,M
>>>        DO J = 1,M
>>>           if(data(J,I) /= 6) &
>>>                & write(ipt,*) "INCORRECT VALUE!",I,J,DATA(I,J)
>>>        END DO
>>>
>>>        DO J = M+1,N
>>>           if(data(J,I) /= 0) &
>>>                & write(ipt,*) "INCORRECT VALUE!",I,J,DATA(I,J)
>>>        END DO
>>>     END DO
>>>
>>>     ! CALL THREE DIFFERENT EXAMPLES OF SUBROUTINES W/OUT AN ITERFACE
>>>     ! THAT USE AN EXPLICIT SHAPE ARRAY
>>>     write(ipt,*) "CALLING DUMMY1"
>>>     CALL DUMMY1
>>>
>>>     write(ipt,*) "CALLING DUMMY2"
>>>     call Dummy2(m,n) ! SHOULD CRASH HERE!
>>>
>>>     write(ipt,*) "CALLING DUMMY3"
>>>     call Dummy3
>>>     write(ipt,*) "FINISHED!"
>>>
>>>   END SUBROUTINE NON_CONTIGUOUS_FAILS
>>>
>>>
>>>   End Module vars
>>>
>>>
>>> Program main
>>>   USE vars
>>>   implicit none
>>>
>>>
>>>   CALL INIT_MPI_ENV(MYID,NPROCS)
>>>
>>>   ipt=myid+10
>>>   OPEN(ipt)
>>>
>>>
>>>   write(ipt,*) "Start memory test!"
>>>
>>>   CALL NON_CONTIGUOUS_FAILS
>>>
>>> !  CALL CONTIGUOUS_WORKS
>>>
>>>   write(ipt,*) "End memory test!"
>>>
>>>   CALL PSHUTDOWN
>>>
>>> END Program main
>>>
>>>
>>>
>>> ! TWO DUMMY SUBROUTINE WITH EXPLICIT SHAPE ARRAYS
>>> ! DUMMY1 DECLARES A VECTOR  - THIS ONE NEVER CAUSES FAILURE
>>> ! DUMMY2 DECLARES AN ARRAY  - THIS ONE CAUSES FAILURE
>>>
>>> SUBROUTINE DUMMY1
>>>   USE vars
>>>   implicit none
>>>   real, dimension(m) :: my_data
>>>
>>>   write(ipt,*) "m,n",m,n
>>>
>>>   write(ipt,*) "DUMMY 1", size(my_data)
>>>
>>> END SUBROUTINE DUMMY1
>>>
>>>
>>> SUBROUTINE DUMMY2(i,j)
>>>   USE vars
>>>   implicit none
>>>   INTEGER, INTENT(IN) ::i,j
>>>
>>>
>>>   real, dimension(i,j) :: my_data
>>>
>>>   write(ipt,*) "start: DUMMY 2", size(my_data)
>>>
>>>
>>> END SUBROUTINE DUMMY2
>>>
>>> SUBROUTINE DUMMY3
>>>   USE vars
>>>   implicit none
>>>
>>>
>>>   real, dimension(m,n) :: my_data
>>>
>>>
>>>   write(ipt,*) "start: DUMMY 3", size(my_data)
>>>
>>>
>>> END SUBROUTINE DUMMY3
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>       
>
>   


More information about the mvapich-discuss mailing list