[mvapich-discuss] Different results from different workers on MPI_Allreduce

Tue Oct 20 21:34:29 EDT 2015

Hi Javier,

You are using a very ancient version of MVAPICH2. Version 1.8 was released in April 2012.
It is more than 3.5 years old. The latest GA version is 2.1. The latest version is 2.2a
and the new 2.2b will be coming out soon.

Many new features (including conforming to the latest MPI standard), performance enhancements and bug-fixes go with every new release.

I will suggest you to upgrade to the latest 2.1 GA version. Otherwise, it will be very hard to provide
support for such an ancient version.

Thanks,

DK

________________________________
From: mvapich-discuss-bounces at cse.ohio-state.edu on behalf of Javier Delgado - NOAA Affiliate [javier.delgado at noaa.gov]
Sent: Tuesday, October 20, 2015 9:20 PM
To: Subramoni, Hari
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Different results from different workers on MPI_Allreduce

Hi Hari,

Here is the output of "mpiname -a":

MVAPICH2 1.8 Mon Apr 30 14:56:40 EDT 2012 ch3:mrail

Compilation
CC: icc    -DNDEBUG -DNVALGRIND -O2
CXX: icpc   -DNDEBUG -DNVALGRIND -O2
F77: ifort   -O2 -L/usr/lib64
FC: ifort   -O2

Configuration
CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/apps/mvapich2/1.8-r5609-intel --with-rdma=gen2 --with-ib-libpath=/usr/lib64 --enable-romio=yes --with-file-system=lustre+panfs --enable-shared

I am using version 12.1.4 of the Intel compiler. I normally use mvapich2 version 1.8, although I also tried with 1.9 and got the same result. I have not tried compiling WRF with 1.9 and rerunning, I can try that next. This is the newest version available on the system.

The CPU type is an Intel Xeon  E5-2650 v2 @ 2.60GHz.
The interconnection is QDR Infiniband. Does that answer your question about HCA type or did you need something else?

Please let me know if you have any other questions.

Thanks,
Javier

On Tue, Oct 20, 2015 at 8:38 PM, Hari Subramoni <subramoni.1 at osu.edu<mailto:subramoni.1 at osu.edu>> wrote:
Hello Javier,

This is a little surprising. Could you please send us the output of mpiname -a and the version of Intel compilers you're using? Could you also let us know the CPU and HCA type of the system you're running on? Are you trying with the latest release of MVAPICH2? If not, can you please try with that?

Thx,
Hari.

On Tue, Oct 20, 2015 at 7:17 PM, Javier Delgado - NOAA Affiliate <javier.delgado at noaa.gov<mailto:javier.delgado at noaa.gov>> wrote:
Hi all,

I am running a program that performs an AllReduce operation using the MPI_MAXLOC operation to determine a global maximum value and rank for 3 variables by passing in a 6-element array wherein the odd-numbered indices contain the values and the even-numbered indices the rank. When run with 180 workers, 175 of them produce one value for the maxloc index, 4 produce another, and 1 produces yet another (i.e. I have three unique results put into recvbuf among all the workers). This results in the application later hanging since some workers are expecting the corresponding global maximum value to be broadcast from a different rank (e.g. task N determines that task X contains the maximum, so it waits for a broadcast from task X, which never arrives because task X determines that task Z contains the maximum).
One thing worth noting is that one of the tasks calculates NaN as the global max (and itself as the worker containing it), which is odd since my understanding is that NaN's should be ignored in MINVAL/MAXVAL as long as not all elements of the array are NaN.

My question is, is this indicative of an issue in MVAPICH, the (Intel) compiler, or the program itself?
If NaN's are not ignored by MaxLoc, I guess the code would need to be modified to deal with this.
This is occurring with a WRF model run, so it is difficult to provide a simple case that reproduces the problem. Here is an excerpt of the code in question:

call MPI_Comm_rank(local_communicator,myrank,ierr)
comm(1)=have_cen
comm(2)=myrank
comm(3)=-mingbl_mslp   ! scalar
comm(4)=myrank
comm(5)=maxgbl_wind
comm(6)=myrank
call MPI_Allreduce(comm,reduced,3,MPI_2REAL,MPI_MAXLOC,local_communicator,ierr)
mingbl_mslp=-reduced(3)
grank=reduced(4)
if(myrank==grank) then
       bcast=(/ plat,plon,real(imslp),real(jmslp) /)
endif
call MPI_Bcast(bcast,4,MPI_REAL,grank,local_communicator,ierr)
if(myrank/=grank) then
       plat=bcast(1)
       plon=bcast(2)
       imslp=bcast(3)
       jmslp=bcast(4)
endif

Thanks much,
Javier

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 10560 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151021/7b2242ee/attachment.bin>