[mvapich-discuss] mvapich bug maybe when using array slicing, reproducer attached
Ben
Benjamin.M.Auer at nasa.gov
Mon Dec 30 14:11:00 EST 2013
In diagnosing a problem we were having with some new code we came across
some strange behaviour with mvapich 2.0a2 and intel 13.1.3.192
Basically we have some worker processes in our job that are buffering 3D
variables that once fully received is written out.
The worker processes receive the data one 2D slice at a time in a loop
as that is how the data is processed on the sending end.
So we have a loop that is something like this on the receiver side
real, allocatable :: buffer(:,:,:)
allocate buffer
do i=1,nslices
call
MPI_RECV(buffer(:,;,i),datasize,MPI_REAL,sender_rank,tag,MPI_COMM_WORLD,mpistatus,status)
enddo
Above a certain size of the first 2 dimensions of the buffer our code
was failing and we traced it to the recv. Somehow despite mpi not
returning an error and saying it received the right amount of data the
buffer was never written to. I initialized it to some non-zero initial
value and the buffer variable sometimes never gets gets touched in the
MPI_RECV call.
When I instead did the receive into a 2D buffer and then copied that to
the 3D buffer the code worked:
real, allocatable :: buffer(:,:,:)
real, allocatable :: buffer2d(:,:)
allocate buffer and buffer2d
do i=1,nslices
call
MPI_RECV(buffer2d,datasize,MPI_REAL,sender_rank,tag,MPI_COMM_WORLD,mpistatus,status)
buffer(:,:,i) = buffer2d
enddo
I've made a little tester that I have attached that reproduces this
problem. Basically the root process just keeps send data to a worker
process in a loop. When I run this and _make sure the worker process
receiving the data is on a different physical node than the root process
sending the data_ the receive will start failing after a couple
iterations of the loops with what I have hard coded in now. This worked
with openmpi so I'm wondering if we have uncovered an mvapich bug or
putting buffer(:,:,i) in the MPI_RECV call is just not safe?
--
Ben Auer, PhD SSAI, Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-286-9176 Fax: 301-614-6246
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131230/4453960d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi_recvbuff.f90
Type: text/x-fortran
Size: 2124 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131230/4453960d/attachment.bin>
More information about the mvapich-discuss
mailing list