[mvapich-discuss] mvapich bug maybe when using array slicing, reproducer attached

Ben Benjamin.M.Auer at nasa.gov
Mon Dec 30 14:11:00 EST 2013


In diagnosing a problem we were having with some new code we came across 
some strange behaviour with mvapich 2.0a2 and intel 13.1.3.192

Basically we have some worker processes in our job that are buffering 3D 
variables that once fully received is written out.
The worker processes receive the data one 2D slice at a time in a loop 
as that is how the data is processed on the sending end.
So we have a loop that is something like this on the receiver side

real, allocatable :: buffer(:,:,:)

allocate buffer

do i=1,nslices
     call 
MPI_RECV(buffer(:,;,i),datasize,MPI_REAL,sender_rank,tag,MPI_COMM_WORLD,mpistatus,status)
enddo

Above a certain size of the first 2 dimensions of the buffer our code 
was failing and we traced it to the recv. Somehow despite mpi not 
returning an error and saying it received the right amount of data the 
buffer was never written to. I initialized it to some non-zero initial 
value and the buffer variable sometimes never gets gets touched in the 
MPI_RECV call.

When I instead did the receive into a 2D buffer and then copied that to 
the 3D buffer the code worked:

real, allocatable :: buffer(:,:,:)
real, allocatable :: buffer2d(:,:)

allocate buffer and buffer2d

do i=1,nslices
     call 
MPI_RECV(buffer2d,datasize,MPI_REAL,sender_rank,tag,MPI_COMM_WORLD,mpistatus,status)
     buffer(:,:,i) = buffer2d
enddo


I've made a little tester that I have attached that reproduces this 
problem. Basically the root process just keeps send data to a worker 
process in a loop. When I run this and _make sure the worker process 
receiving the data is on a different physical node than the root process 
sending the data_ the receive will start failing after a couple 
iterations of the loops with what I have hard coded in now. This worked 
with openmpi so I'm wondering if we have uncovered an mvapich bug or 
putting buffer(:,:,i) in the MPI_RECV call is just not safe?

-- 
Ben Auer, PhD   SSAI, Scientific Programmer/Analyst
NASA GSFC,  Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD  20771
Phone: 301-286-9176               Fax: 301-614-6246

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131230/4453960d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi_recvbuff.f90
Type: text/x-fortran
Size: 2124 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131230/4453960d/attachment.bin>


More information about the mvapich-discuss mailing list