[mvapich-discuss] Different results from different workers on MPI_Allreduce

Javier Delgado - NOAA Affiliate javier.delgado at noaa.gov
Tue Oct 20 19:17:40 EDT 2015


Hi all,

I am running a program that performs an* AllReduce* operation using the
MPI_MAXLOC operation to determine a global maximum value and rank for 3
variables by passing in a 6-element array wherein the odd-numbered indices
contain the values and the even-numbered indices the rank. When run with
180 workers, 175 of them produce one value for the *maxloc *index, 4
produce another, and 1 produces yet another (i.e. I have three unique
results put into *recvbuf *among all the workers*)*. This results in the
application later hanging since some workers are expecting the
corresponding global maximum value to be broadcast from a different rank
(e.g. task N determines that task X contains the maximum, so it waits for a
broadcast from task X, which never arrives because task X determines that
task Z contains the maximum).
One thing worth noting is that one of the tasks calculates NaN as the
global max (and itself as the worker containing it), which is odd since my
understanding is that NaN's should be ignored in MINVAL/MAXVAL as long as
not all elements of the array are NaN.

My question is, is this indicative of an issue in MVAPICH, the (Intel)
compiler, or the program itself?
If NaN's are not ignored by MaxLoc, I guess the code would need to be
modified to deal with this.
This is occurring with a WRF model run, so it is difficult to provide a
simple case that reproduces the problem. Here is an excerpt of the code in
question:

call MPI_Comm_rank(local_communicator,myrank,ierr)
comm(1)=have_cen
comm(2)=myrank
comm(3)=-mingbl_mslp   ! scalar
comm(4)=myrank
comm(5)=maxgbl_wind
comm(6)=myrank
call
MPI_Allreduce(comm,reduced,3,MPI_2REAL,MPI_MAXLOC,local_communicator,ierr)
mingbl_mslp=-reduced(3)
grank=reduced(4)
if(myrank==grank) then
       bcast=(/ plat,plon,real(imslp),real(jmslp) /)
endif
call MPI_Bcast(bcast,4,MPI_REAL,grank,local_communicator,ierr)
if(myrank/=grank) then
       plat=bcast(1)
       plon=bcast(2)
       imslp=bcast(3)
       jmslp=bcast(4)
endif



Thanks much,
Javier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151020/1abe8d11/attachment.html>


More information about the mvapich-discuss mailing list