[mvapich-discuss] Strange error with MPI_REDUCE
Christian Boehme
Christian_Boehme at freenet.de
Fri Dec 7 12:49:08 EST 2007
Dear list,
we recently encountered a strange problem with MPI_REDUCE in our
mvapich-0.9.9 installation. Please consider the following F77 program:
program reduce_err
implicit none
c FORTRAN MPI-INCLUDE-file
include 'mpif.h'
integer ierr, nproc, myid
real*8 x , y
call MPI_INIT( ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, nproc, ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
x = 0
y = 1
call MPI_REDUCE( y, x, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 1,
: MPI_COMM_WORLD, ierr )
write(6,*) myid, ': Value for x after reduce:', x
call MPI_FINALIZE( ierr )
stop
end
Obviously, the output should be the number of processes for myid=1, and
zero for all other processes. This is also what we get when using either
one process per node (only Infiniband communication) or put all
processes on one node (only shared memory):
> mpirun_rsh -np 4 gwdm001 gwdm004 gwdm002 gwdm003 reduce_err
> 3 : Value for x after reduce: 0.00000000000000
> 2 : Value for x after reduce: 0.00000000000000
> 1 : Value for x after reduce: 4.00000000000000
> 0 : Value for x after reduce: 0.00000000000000
However, when mixing the two, i.e., utilizing several nodes and more
than one process on those nodes, we also get the number of processes for
myid=0:
> mpirun_rsh -np 4 gwdm001 gwdm001 gwdm002 gwdm003 reduce_err
> 1 : Value for x after reduce: 4.00000000000000
> 2 : Value for x after reduce: 0.00000000000000
> 3 : Value for x after reduce: 0.00000000000000
> 0 : Value for x after reduce: 4.00000000000000
This behavior is rather unexpected and can seriously break some
programs. What could be the problem? Many thanks in advance
Christian Boehme
More information about the mvapich-discuss
mailing list