[mvapich-discuss] Different results from different workers on MPI_Allreduce

Hari Subramoni subramoni.1 at osu.edu
Tue Oct 20 20:38:11 EDT 2015


Hello Javier,

This is a little surprising. Could you please send us the output of mpiname
-a and the version of Intel compilers you're using? Could you also let us
know the CPU and HCA type of the system you're running on? Are you trying
with the latest release of MVAPICH2? If not, can you please try with that?

Thx,
Hari.

On Tue, Oct 20, 2015 at 7:17 PM, Javier Delgado - NOAA Affiliate <
javier.delgado at noaa.gov> wrote:

> Hi all,
>
> I am running a program that performs an* AllReduce* operation using the
> MPI_MAXLOC operation to determine a global maximum value and rank for 3
> variables by passing in a 6-element array wherein the odd-numbered indices
> contain the values and the even-numbered indices the rank. When run with
> 180 workers, 175 of them produce one value for the *maxloc *index, 4
> produce another, and 1 produces yet another (i.e. I have three unique
> results put into *recvbuf *among all the workers*)*. This results in the
> application later hanging since some workers are expecting the
> corresponding global maximum value to be broadcast from a different rank
> (e.g. task N determines that task X contains the maximum, so it waits for a
> broadcast from task X, which never arrives because task X determines that
> task Z contains the maximum).
> One thing worth noting is that one of the tasks calculates NaN as the
> global max (and itself as the worker containing it), which is odd since my
> understanding is that NaN's should be ignored in MINVAL/MAXVAL as long as
> not all elements of the array are NaN.
>
> My question is, is this indicative of an issue in MVAPICH, the (Intel)
> compiler, or the program itself?
> If NaN's are not ignored by MaxLoc, I guess the code would need to be
> modified to deal with this.
> This is occurring with a WRF model run, so it is difficult to provide a
> simple case that reproduces the problem. Here is an excerpt of the code in
> question:
>
> call MPI_Comm_rank(local_communicator,myrank,ierr)
> comm(1)=have_cen
> comm(2)=myrank
> comm(3)=-mingbl_mslp   ! scalar
> comm(4)=myrank
> comm(5)=maxgbl_wind
> comm(6)=myrank
> call
> MPI_Allreduce(comm,reduced,3,MPI_2REAL,MPI_MAXLOC,local_communicator,ierr)
> mingbl_mslp=-reduced(3)
> grank=reduced(4)
> if(myrank==grank) then
>        bcast=(/ plat,plon,real(imslp),real(jmslp) /)
> endif
> call MPI_Bcast(bcast,4,MPI_REAL,grank,local_communicator,ierr)
> if(myrank/=grank) then
>        plat=bcast(1)
>        plon=bcast(2)
>        imslp=bcast(3)
>        jmslp=bcast(4)
> endif
>
>
>
> Thanks much,
> Javier
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151020/f3add79f/attachment.html>


More information about the mvapich-discuss mailing list