[mvapich-discuss] Strange error with MPI_REDUCE

Christian Boehme Christian_Boehme at freenet.de
Fri Dec 7 12:49:08 EST 2007


Dear list,

we recently encountered a strange problem with MPI_REDUCE in our 
mvapich-0.9.9 installation. Please consider the following F77 program:

       program reduce_err

       implicit none
c FORTRAN MPI-INCLUDE-file
       include 'mpif.h'
       integer ierr, nproc, myid
       real*8  x , y

       call MPI_INIT( ierr )
       call MPI_COMM_SIZE( MPI_COMM_WORLD, nproc, ierr )
       call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
       x = 0
       y = 1
       call MPI_REDUCE( y, x, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 1,
      :                 MPI_COMM_WORLD, ierr )
       write(6,*) myid, ': Value for x after reduce:', x
       call MPI_FINALIZE( ierr )

       stop
       end

Obviously, the output should be the number of processes for myid=1, and 
zero for all other processes. This is also what we get when using either 
one process per node (only Infiniband communication) or put all 
processes on one node (only shared memory):

> mpirun_rsh -np 4 gwdm001 gwdm004 gwdm002 gwdm003 reduce_err
>            3 : Value for x after reduce:   0.00000000000000
>            2 : Value for x after reduce:   0.00000000000000
>            1 : Value for x after reduce:   4.00000000000000
>            0 : Value for x after reduce:   0.00000000000000

However, when mixing the two, i.e., utilizing several nodes and more 
than one process on those nodes, we also get the number of processes for 
myid=0:

> mpirun_rsh -np 4 gwdm001 gwdm001 gwdm002 gwdm003 reduce_err
>            1 : Value for x after reduce:   4.00000000000000
>            2 : Value for x after reduce:   0.00000000000000
>            3 : Value for x after reduce:   0.00000000000000
>            0 : Value for x after reduce:   4.00000000000000

This behavior is rather unexpected and can seriously break some 
programs. What could be the problem? Many thanks in advance

Christian Boehme



More information about the mvapich-discuss mailing list