[mvapich-discuss] Strange error with MPI_REDUCE
amith rajith mamidala
mamidala at cse.ohio-state.edu
Sun Dec 9 16:03:56 EST 2007
Hi Christian,
Can you also try the patch I am attaching with this mail and let us know
how it works?
Thanks,
Amith.
On Sat, 8 Dec 2007, Dhabaleswar Panda wrote:
> Thanks for reporting this issue. Can you tell us which version of 0.9.9
> you are using (the one available with OFED 1.2 or from the OSU site).
> Which compiler are you using? Can you also check whether you see the same
> problem with the latest MVAPICH 1.0-beta (please use the latest version
> from the trunk).
>
> In the mean time, we will also investigate this issue further.
>
> Thanks,
>
> DK
>
>
> On Fri, 7 Dec 2007, Christian Boehme wrote:
>
> > Dear list,
> >
> > we recently encountered a strange problem with MPI_REDUCE in our
> > mvapich-0.9.9 installation. Please consider the following F77 program:
> >
> > program reduce_err
> >
> > implicit none
> > c FORTRAN MPI-INCLUDE-file
> > include 'mpif.h'
> > integer ierr, nproc, myid
> > real*8 x , y
> >
> > call MPI_INIT( ierr )
> > call MPI_COMM_SIZE( MPI_COMM_WORLD, nproc, ierr )
> > call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
> > x = 0
> > y = 1
> > call MPI_REDUCE( y, x, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 1,
> > : MPI_COMM_WORLD, ierr )
> > write(6,*) myid, ': Value for x after reduce:', x
> > call MPI_FINALIZE( ierr )
> >
> > stop
> > end
> >
> > Obviously, the output should be the number of processes for myid=1, and
> > zero for all other processes. This is also what we get when using either
> > one process per node (only Infiniband communication) or put all
> > processes on one node (only shared memory):
> >
> > > mpirun_rsh -np 4 gwdm001 gwdm004 gwdm002 gwdm003 reduce_err
> > > 3 : Value for x after reduce: 0.00000000000000
> > > 2 : Value for x after reduce: 0.00000000000000
> > > 1 : Value for x after reduce: 4.00000000000000
> > > 0 : Value for x after reduce: 0.00000000000000
> >
> > However, when mixing the two, i.e., utilizing several nodes and more
> > than one process on those nodes, we also get the number of processes for
> > myid=0:
> >
> > > mpirun_rsh -np 4 gwdm001 gwdm001 gwdm002 gwdm003 reduce_err
> > > 1 : Value for x after reduce: 4.00000000000000
> > > 2 : Value for x after reduce: 0.00000000000000
> > > 3 : Value for x after reduce: 0.00000000000000
> > > 0 : Value for x after reduce: 4.00000000000000
> >
> > This behavior is rather unexpected and can seriously break some
> > programs. What could be the problem? Many thanks in advance
> >
> > Christian Boehme
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
Index: intra_fns_new.c
===================================================================
--- intra_fns_new.c (revision 1650)
+++ intra_fns_new.c (working copy)
@@ -5074,7 +5074,7 @@
MPI_Comm shmem_comm, leader_comm;
struct MPIR_COMMUNICATOR *comm_ptr = 0,*shmem_commptr = 0, *leader_commptr = 0;
int local_rank = -1, global_rank = -1, local_size=0, my_rank;
- void* local_buf=NULL, *tmpbuf=NULL;
+ void* local_buf=NULL, *tmpbuf=NULL, *tmpbuf1=NULL;
int stride = 0, i, is_commutative;
int leader_root, total_size=0, shmem_comm_rank;
@@ -5156,6 +5156,11 @@
MPIR_REDUCE_TAG, comm_ptr->self, &status);
}
+ if (local_rank == 0){
+ MPIR_ALLOC(tmpbuf1, MALLOC(count*extent), comm_ptr, MPI_ERR_EXHAUSTED, myname);
+ tmpbuf1 = (void *)((char*)tmpbuf1 - lb);
+ }
+
if (local_size > 1){
MPID_SHMEM_COLL_GetShmemBuf(local_size, local_rank, shmem_comm_rank, &shmem_buf);
}
@@ -5176,11 +5181,11 @@
leader_root = comm_ptr->leader_rank[leader_of_root];
if (local_size != total_size){
if (local_size > 1){
- mpi_errno = intra_Reduce(tmpbuf, recvbuf, count, datatype,
+ mpi_errno = intra_Reduce(tmpbuf, tmpbuf1, count, datatype,
op, leader_root, leader_commptr);
}
else{
- mpi_errno = intra_Reduce(sendbuf, recvbuf, count, datatype,
+ mpi_errno = intra_Reduce(sendbuf, tmpbuf1, count, datatype,
op, leader_root, leader_commptr);
}
}
@@ -5207,19 +5212,27 @@
MPID_SHMEM_COLL_SetGatherComplete(local_size, local_rank, shmem_comm_rank);
}
+ if ((local_rank == 0) && (root == my_rank)){
+ mpi_errno = MPI_Sendrecv(tmpbuf1, count, datatype->self, rank,
+ MPIR_REDUCE_TAG, recvbuf, count, datatype->self, rank,
+ MPIR_REDUCE_TAG, comm_ptr->self, &status);
+ return MPI_SUCCESS;
+ }
+
/* Copying data from leader to the root incase
* leader is not the root */
if (local_size > 1){
/* Send the message to the root if the leader is not the
* root of the reduce operation */
+
if ((local_rank == 0) && (root != my_rank) && (leader_root == global_rank)){
if (local_size == total_size){
mpi_errno = MPI_Send( tmpbuf, count, datatype->self, root,
MPIR_REDUCE_TAG, comm->self );
}
else{
- mpi_errno = MPI_Send( recvbuf, count, datatype->self, root,
+ mpi_errno = MPI_Send( tmpbuf1, count, datatype->self, root,
MPIR_REDUCE_TAG, comm->self );
}
}
More information about the mvapich-discuss
mailing list