[mvapich-discuss] Passive target communication with PSM-CH3
Mingzhe Li
li.2192 at osu.edu
Fri Oct 16 14:07:13 EDT 2015
Hi Nathan,
Thanks for your note. Currently, PSM doesn't support truly one-sided
communication. That's why the communication executes slowly when the target
is doing computation.
Thanks,
Mingzhe
On Fri, Oct 16, 2015 at 11:51 AM, Nathan Weeks <weeks at iastate.edu> wrote:
> --===============7014773323673982084==
> Content-Type: multipart/alternative;
> boundary="001a11c33fd69032f905223ac448"
>
> --001a11c33fd69032f905223ac448
> Content-Type: text/plain; charset="UTF-8"
>
> Does MVAPICH 2.1 support passive target communication with PSM-CH3 devices?
> We're using Intel InfiniPath_QLE7340, and it seems like an
> MPI_Win_lock()/MPI_Fetch_and_op()/MPI_Win_unlock() sequence executes slowly
> if the target is doing computation.
>
> MVAPICH 2.1 was configured thus (where the Intel compiler is version
> 15.0.1):
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ./configure --prefix=/path/to/mvapich/2.1 \
> --enable-fortran=yes \
> --with-device=ch3:psm \
> --enable-threads=multiple \
> CC=icc CXX=icpc FC=ifort F77=ifort
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> In the following test program, all ranks fetch & increment a value on rank
> 0. Rank 0 also does computation if RANK0_POLL isn't defined at compile
> time.
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> program test_fetch_and_op
> use iso_fortran_env, only: int64
> use, intrinsic :: iso_c_binding, only: c_ptr, c_f_pointer
> use mpi
> implicit none
>
> integer :: info, integer_size, i, win, ierr, rank, world_size, n_i=0,
> sum_i=0, reduce_sum_i
> integer, pointer :: iteration
> integer(kind=int64) :: count_rate, time1 = 0, time2 = 0,
> time_fetch_and_op = 0
> integer, parameter :: one = 1, n = 1024, max_i = 2**16-1
> integer, allocatable :: rank_i(:)
> type(c_ptr) :: p_iteration
> double precision :: A(n,n), B(n,n), A_min
> double precision, allocatable :: rank_time(:)
>
> call random_number(A)
> call random_number(B)
> call system_clock(COUNT_RATE = count_rate)
>
> call MPI_Init(ierr)
> call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr);
> call MPI_Comm_size(MPI_COMM_WORLD, world_size, ierr);
> call MPI_Info_create(info, ierr)
> call MPI_Info_set(info, "accumulate_ops", "same_op", ierr)
> call MPI_Sizeof(i, integer_size, ierr)
> call MPI_Win_allocate(INT(integer_size, KIND=MPI_ADDRESS_KIND), 1, info,
> MPI_COMM_WORLD, p_iteration, win, ierr)
> call C_F_POINTER(p_iteration, iteration)
> if (rank == 0) then
> iteration = 1
> allocate(rank_i(world_size))
> allocate(rank_time(world_size))
> else
> ! avoid Intel Fortran error in MPI_Gather when -check is used
> allocate(rank_i(1))
> allocate(rank_time(1))
> end if
>
> call MPI_Barrier(MPI_COMM_WORLD,ierr)
>
> do while (.true.)
> #ifdef RANK0_POLL
> if (rank == 0) then
> call MPI_WIN_LOCK(MPI_LOCK_SHARED, 0, 0, win,ierr);
> call MPI_Get(i, 1, MPI_INTEGER, 0, 0_MPI_ADDRESS_KIND, 1,
> MPI_INTEGER, win, ierr)
> call MPI_WIN_UNLOCK(0, win, ierr)
> if (i > max_i) exit
> else
> #endif
> call system_clock(time1)
> call MPI_WIN_LOCK(MPI_LOCK_EXCLUSIVE, 0, 0, win,ierr);
> call MPI_FETCH_AND_OP(one, i, MPI_INTEGER, 0, 0_MPI_ADDRESS_KIND,
> MPI_SUM, win, ierr)
> call MPI_WIN_UNLOCK(0, win, ierr)
> call system_clock(time2)
> if (i > max_i) exit
> time_fetch_and_op = time_fetch_and_op + (time2-time1)
> A = A + A * B ! do some computation
> n_i = n_i + 1
> sum_i = sum_i + i
> #ifdef RANK0_POLL
> end if
> #endif
> end do
>
> call MPI_Reduce(A(1,1), A_min, 1, MPI_DOUBLE_PRECISION, MPI_MIN, 0,
> MPI_COMM_WORLD, ierr)
> call MPI_Gather(n_i, 1, MPI_INTEGER, rank_i, 1, MPI_INTEGER, 0,
> MPI_COMM_WORLD, ierr)
> call MPI_Gather(DBLE(time_fetch_and_op)/count_rate, 1,
> MPI_DOUBLE_PRECISION, rank_time, &
> 1, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
> call MPI_Reduce(sum_i, reduce_sum_i, 1, MPI_INTEGER, MPI_SUM, 0,
> MPI_COMM_WORLD, ierr)
>
> if (rank == 0) then
> write(*,*) 'minval(A(1,1)[:]) = ', A_min
> do i = 1, SIZE(rank_i)
> write(*,*) 'rank ', i-1, rank_i(i), rank_time(i)
> end do
> write(*,*) 'SUM(rank_i)', SUM(rank_i)
> write(*,*) 'reduce_sum_i', reduce_sum_i
> end if
>
> call MPI_Win_free(win, ierr)
> call MPI_Finalize(ierr)
> end program test_fetch_and_op
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Compiled thus:
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> mpif90 -o test_fetch_and_op test_fetch_and_op.F90
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Invoked from a job script to run on 2 nodes (16 ranks/node):
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> mpirun -n 32 ./test_fetch_and_op
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The output indicates that ranks spend varying amounts of (cumulative) time
> waiting for the MPI_Win_lock()/MPI_Fethch_and_op()/MPI_Unlock() sequence to
> complete (4th column, in seconds). The third column indicates the number of
> iterations executed by that rank.
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> minval(A(1,1)[:]) = 3.013034008757427E+118
> rank 0 4585 0.988093000000000
> rank 1 2293 4.60248800000000
> rank 2 2293 1.99486700000000
> rank 3 2294 4.69011900000000
> rank 4 2294 4.74206200000000
> rank 5 2309 8.53757700000000
> rank 6 4147 6.24735400000000
> rank 7 2293 4.54687600000000
> rank 8 2417 8.60026900000000
> rank 9 2293 8.49449000000000
> rank 10 2294 8.48404600000000
> rank 11 2293 4.56649700000000
> rank 12 2293 1.91861300000000
> rank 13 2291 4.60065300000000
> rank 14 2293 1.90240000000000
> rank 15 2293 8.51508700000000
> rank 16 2292 2.43570500000000
> rank 17 2292 2.40850700000000
> rank 18 2292 2.47852800000000
> rank 19 765 3.55292800000000
> rank 20 2291 2.34879100000000
> rank 21 765 3.48627200000000
> rank 22 765 3.41105000000000
> rank 23 2292 2.61309200000000
> rank 24 765 3.45583100000000
> rank 25 765 3.56492300000000
> rank 26 870 4.03421000000000
> rank 27 2292 2.45449000000000
> rank 28 2292 2.57531800000000
> rank 29 2292 2.41837300000000
> rank 30 765 3.47952800000000
> rank 31 765 3.48015500000000
> SUM(rank_i) 65535
> reduce_sum_i 2147450880
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Compiling the code so that the rank 0 process doesn't do any computation
> produces much lower cumulative wait times for the other ranks:
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> mpif90 -DRANK0_POLL -o test_fetch_and_op test_fetch_and_op.f90
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> minval(A(1,1)[:]) = 3.920868194323862E-007
> rank 0 0 0.000000000000000E+000
> rank 1 2812 0.122509000000000
> rank 2 3824 0.175148000000000
> rank 3 1051 5.798800000000000E-002
> rank 4 1305 4.384800000000000E-002
> rank 5 1041 5.300300000000000E-002
> rank 6 1585 4.846100000000000E-002
> rank 7 1055 4.719500000000000E-002
> rank 8 1329 4.898600000000000E-002
> rank 9 3836 0.158367000000000
> rank 10 1052 5.660700000000000E-002
> rank 11 4087 8.280400000000000E-002
> rank 12 1488 5.919000000000000E-002
> rank 13 4076 0.109046000000000
> rank 14 1008 5.416400000000000E-002
> rank 15 1487 5.333600000000000E-002
> rank 16 2142 0.137507000000000
> rank 17 1553 0.117416000000000
> rank 18 2175 0.138576000000000
> rank 19 2592 0.187258000000000
> rank 20 2102 0.136301000000000
> rank 21 2174 0.136671000000000
> rank 22 2013 0.166204000000000
> rank 23 2531 0.195296000000000
> rank 24 2057 0.147237000000000
> rank 25 2127 0.138742000000000
> rank 26 2591 0.193355000000000
> rank 27 2151 0.156355000000000
> rank 28 2009 0.171320000000000
> rank 29 2121 0.140748000000000
> rank 30 1573 0.118845000000000
> rank 31 2588 0.159494000000000
> SUM(rank_i) 65535
> reduce_sum_i 2147450880
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Any advice would be appreciated. Thanks!
>
> --
> Nathan Weeks
> Systems Analyst
> Iowa State University -- Department of Mathematics
> http://weeks.public.iastate.edu/
>
> --001a11c33fd69032f905223ac448
> Content-Type: text/html; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
>
> <div dir=3D"ltr">Does MVAPICH 2.1 support passive target communication
> with=
> PSM-CH3 devices? We're using Intel InfiniPath_QLE7340, and it seems
> li=
> ke an MPI_Win_lock()/MPI_Fetch_and_op()/MPI_Win_unlock() sequence executes
> =
> slowly if the target is doing computation.<br><br>MVAPICH 2.1 was
> configure=
> d thus (where the Intel compiler is version
> 15.0.1):<br><br>~~~~~~~~~~~~~~~=
> ~~~~~~~~~~~~~~~~~~~~~~~~~<br>./configure --prefix=3D/path/to/mvapich/2.1
> \<=
> br>=C2=A0 --enable-fortran=3Dyes \<br>=C2=A0 --with-device=3Dch3:psm \<br>=
> =C2=A0 --enable-threads=3Dmultiple \<br>=C2=A0 CC=3Dicc CXX=3Dicpc
> FC=3Difo=
> rt F77=3Difort<br>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br><br>In the
> fo=
> llowing test program, all ranks fetch & increment a value on rank 0.
> Ra=
> nk 0 also does computation if RANK0_POLL isn't defined at compile
> time.=
> <br><br>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br><div>program
> test_fetch=
> _and_op</div><div>=C2=A0 =C2=A0use iso_fortran_env, only: int64</div><div>=
> =C2=A0 =C2=A0use, intrinsic :: iso_c_binding, only: c_ptr,
> c_f_pointer</div=
> ><div>=C2=A0 =C2=A0use mpi</div><div>=C2=A0 =C2=A0implicit
> none</div><div><=
> br></div><div>=C2=A0 =C2=A0integer :: info, integer_size, i, win, ierr,
> ran=
> k, world_size, n_i=3D0, sum_i=3D0, reduce_sum_i</div><div>=C2=A0
> =C2=A0inte=
> ger, pointer :: iteration</div><div>=C2=A0 =C2=A0integer(kind=3Dint64) ::
> c=
> ount_rate, time1 =3D 0, time2 =3D 0, time_fetch_and_op =3D 0</div><div>=C2=
> =A0 =C2=A0integer, parameter :: one =3D 1, n =3D 1024, max_i =3D
> 2**16-1</d=
> iv><div>=C2=A0 =C2=A0integer, allocatable :: rank_i(:)</div><div>=C2=A0
> =C2=
> =A0type(c_ptr) :: p_iteration</div><div>=C2=A0 =C2=A0double precision ::
> A(=
> n,n), B(n,n), A_min</div><div>=C2=A0 =C2=A0double precision, allocatable
> ::=
> rank_time(:)</div><div><br></div><div>=C2=A0 =C2=A0call
> random_number(A)</=
> div><div>=C2=A0 =C2=A0call random_number(B)</div><div>=C2=A0 =C2=A0call
> sys=
> tem_clock(COUNT_RATE =3D count_rate)</div><div><br></div><div>=C2=A0
> =C2=A0=
> call MPI_Init(ierr)</div><div>=C2=A0 =C2=A0call
> MPI_Comm_rank(MPI_COMM_WORL=
> D, rank, ierr);</div><div>=C2=A0 =C2=A0call MPI_Comm_size(MPI_COMM_WORLD,
> w=
> orld_size, ierr);</div><div>=C2=A0 =C2=A0call MPI_Info_create(info,
> ierr)</=
> div><div>=C2=A0 =C2=A0call MPI_Info_set(info, "accumulate_ops",
> &=
> quot;same_op", ierr)</div><div>=C2=A0 =C2=A0call MPI_Sizeof(i,
> integer=
> _size, ierr)</div><div>=C2=A0 =C2=A0call
> MPI_Win_allocate(INT(integer_size,=
> KIND=3DMPI_ADDRESS_KIND), 1, info, MPI_COMM_WORLD, p_iteration, win,
> ierr)=
> </div><div>=C2=A0 =C2=A0call C_F_POINTER(p_iteration,
> iteration)</div><div>=
> =C2=A0 =C2=A0if (rank =3D=3D 0) then</div><div>=C2=A0 =C2=A0 =C2=A0
> iterati=
> on =3D 1</div><div>=C2=A0 =C2=A0 =C2=A0
> allocate(rank_i(world_size))</div><=
> div>=C2=A0 =C2=A0 =C2=A0 allocate(rank_time(world_size))</div><div>=C2=A0 =
> =C2=A0else</div><div>=C2=A0 =C2=A0 =C2=A0 ! avoid Intel Fortran error in
> MP=
> I_Gather when -check is used</div><div>=C2=A0 =C2=A0 =C2=A0
> allocate(rank_i=
> (1))</div><div>=C2=A0 =C2=A0 =C2=A0
> allocate(rank_time(1))</div><div>=C2=A0=
> =C2=A0end if</div><div><br></div><div>=C2=A0 =C2=A0call
> MPI_Barrier(MPI_CO=
> MM_WORLD,ierr)</div><div><br></div><div>=C2=A0 =C2=A0do while
> (.true.)</div=
> ><div><div>#ifdef RANK0_POLL</div><div>=C2=A0 =C2=A0 =C2=A0 if (rank
> =3D=3D=
> 0) then</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0call
> MPI_WIN_LOCK(MPI_=
> LOCK_SHARED, 0, 0, win,ierr);</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0
> =C2=A0c=
> all MPI_Get(i, 1, MPI_INTEGER, 0, 0_MPI_ADDRESS_KIND, 1, MPI_INTEGER, win,
> =
> ierr)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0call MPI_WIN_UNLOCK(0,
> wi=
> n, ierr)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (i > max_i)
> exit=
> </div><div>=C2=A0 =C2=A0 =C2=A0 else</div><div>#endif</div><div>=C2=A0 =C2=
> =A0 =C2=A0 call system_clock(time1)</div><div>=C2=A0 =C2=A0 =C2=A0 call
> MPI=
> _WIN_LOCK(MPI_LOCK_EXCLUSIVE, 0, 0, win,ierr);</div><div>=C2=A0 =C2=A0 =C2=
> =A0 call MPI_FETCH_AND_OP(one, i, MPI_INTEGER, 0, 0_MPI_ADDRESS_KIND,
> MPI_S=
> UM, win, ierr)</div><div>=C2=A0 =C2=A0 =C2=A0 call MPI_WIN_UNLOCK(0, win,
> i=
> err)</div><div>=C2=A0 =C2=A0 =C2=A0 call system_clock(time2)</div><div>=C2=
> =A0 =C2=A0 =C2=A0 if (i > max_i) exit</div><div>=C2=A0 =C2=A0 =C2=A0
> tim=
> e_fetch_and_op =3D time_fetch_and_op + (time2-time1)</div><div>=C2=A0 =C2=
> =A0 =C2=A0 A =3D A + A * B ! do some computation</div><div>=C2=A0 =C2=A0 =
> =C2=A0 n_i =3D n_i + 1</div><div>=C2=A0 =C2=A0 =C2=A0 sum_i =3D sum_i +
> i</=
> div><div>#ifdef RANK0_POLL</div><div>=C2=A0 =C2=A0 =C2=A0 end
> if</div><div>=
> #endif</div><div>=C2=A0 =C2=A0end do</div><div><br></div><div>=C2=A0
> =C2=A0=
> call MPI_Reduce(A(1,1), A_min, 1, MPI_DOUBLE_PRECISION, MPI_MIN, 0,
> MPI_COM=
> M_WORLD, ierr)</div><div>=C2=A0 =C2=A0call MPI_Gather(n_i, 1, MPI_INTEGER,
> =
> rank_i, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr)</div><div>=C2=A0
> =C2=A0cal=
> l MPI_Gather(DBLE(time_fetch_and_op)/count_rate, 1, MPI_DOUBLE_PRECISION,
> r=
> ank_time, &</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
> =
> =C2=A0 =C2=A0 =C2=A01, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD,
> ierr)</div>=
> <div>=C2=A0 =C2=A0call MPI_Reduce(sum_i, reduce_sum_i, 1, MPI_INTEGER,
> MPI_=
> SUM, 0, MPI_COMM_WORLD, ierr)</div><div><br></div><div>=C2=A0 =C2=A0if
> (ran=
> k =3D=3D 0) then</div><div>=C2=A0 =C2=A0 =C2=A0 write(*,*)
> 'minval(A(1,=
> 1)[:]) =3D ', A_min</div><div>=C2=A0 =C2=A0 =C2=A0 do i =3D 1,
> SIZE(ran=
> k_i)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0write(*,*) 'rank
> '=
> , i-1, rank_i(i), rank_time(i)</div><div>=C2=A0 =C2=A0 =C2=A0 end
> do</div><=
> div>=C2=A0 =C2=A0 =C2=A0 write(*,*) 'SUM(rank_i)',
> SUM(rank_i)</div=
> ><div>=C2=A0 =C2=A0 =C2=A0 write(*,*) 'reduce_sum_i',
> reduce_sum_i<=
> /div><div>=C2=A0 =C2=A0end if</div><div><br></div><div>=C2=A0 =C2=A0call
> MP=
> I_Win_free(win, ierr)</div><div>=C2=A0 =C2=A0call
> MPI_Finalize(ierr)</div><=
> div>end program
> test_fetch_and_op</div></div>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=
> ~~~~~~~~~~<br><br>Compiled
> thus:<br><br>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=
> ~~~~~<br>mpif90 -o test_fetch_and_op
> test_fetch_and_op.F90<br>~~~~~~~~~~~~~=
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~<br><br>Invoked from a job script to run on 2
> no=
> des (16
> ranks/node):<br><br>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br>mpi=
> run -n 32
> ./test_fetch_and_op<br>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<b=
> r><br>The output indicates that ranks spend varying amounts of
> (cumulative)=
> time waiting for the MPI_Win_lock()/MPI_Fethch_and_op()/MPI_Unlock()
> seque=
> nce to complete (4th column, in seconds). The third column indicates the
> nu=
> mber of iterations executed by that
> rank.<br><br>~~~~~~~~~~~~~~~~~~~~~~~~~~=
> ~~~~~~~~~~~~~~<br><div>=C2=A0minval(A(1,1)[:]) =3D =C2=A0
> 3.013034008757427=
> E+118</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=
> =A0 =C2=A0 =C2=A0 =C2=A04585 =C2=A00.988093000000000</div><div>=C2=A0rank =
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A02293
> =
> =C2=A0 4.60248800000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
> =C2=A0 =C2=A02 =C2=A0 =C2=A0 =C2=A0 =C2=A02293 =C2=A0
> 1.99486700000000</div=
> ><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A03 =C2=A0 =C2=A0 =
> =C2=A0 =C2=A02294 =C2=A0 4.69011900000000</div><div>=C2=A0rank =C2=A0 =C2=
> =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A04 =C2=A0 =C2=A0 =C2=A0 =C2=A02294 =C2=A0
> 4.7=
> 4206200000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A05 =C2=A0 =C2=A0 =C2=A0 =C2=A02309 =C2=A0 8.53757700000000</div><div>=C2=
> =A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A06 =C2=A0 =C2=A0 =C2=A0
> =C2=
> =A04147 =C2=A0 6.24735400000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =
> =C2=A0 =C2=A0 =C2=A07 =C2=A0 =C2=A0 =C2=A0 =C2=A02293 =C2=A0
> 4.546876000000=
> 00</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A08 =C2=A0 =
> =C2=A0 =C2=A0 =C2=A02417 =C2=A0 8.60026900000000</div><div>=C2=A0rank =C2=
> =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A09 =C2=A0 =C2=A0 =C2=A0 =C2=A02293
> =C2=
> =A0 8.49449000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A0 10 =C2=A0 =C2=A0 =C2=A0 =C2=A02294 =C2=A0 8.48404600000000</div><div>=
> =C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 11 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A02293 =C2=A0 4.56649700000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =
> =C2=A0 =C2=A0 12 =C2=A0 =C2=A0 =C2=A0 =C2=A02293 =C2=A0
> 1.91861300000000</d=
> iv><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 13 =C2=A0 =C2=A0 =C2=
> =A0 =C2=A02291 =C2=A0 4.60065300000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =
> =C2=A0 =C2=A0 =C2=A0 14 =C2=A0 =C2=A0 =C2=A0 =C2=A02293 =C2=A0
> 1.9024000000=
> 0000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 15 =C2=A0 =C2=
> =A0 =C2=A0 =C2=A02293 =C2=A0 8.51508700000000</div><div>=C2=A0rank =C2=A0 =
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 16 =C2=A0 =C2=A0 =C2=A0 =C2=A02292 =C2=A0
> 2.435=
> 70500000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 17 =C2=
> =A0 =C2=A0 =C2=A0 =C2=A02292 =C2=A0 2.40850700000000</div><div>=C2=A0rank =
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 18 =C2=A0 =C2=A0 =C2=A0 =C2=A02292 =C2=
> =A0 2.47852800000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A0 19 =C2=A0 =C2=A0 =C2=A0 =C2=A0 765 =C2=A0 3.55292800000000</div><div>=
> =C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 20 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A02291 =C2=A0 2.34879100000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =
> =C2=A0 =C2=A0 21 =C2=A0 =C2=A0 =C2=A0 =C2=A0 765 =C2=A0
> 3.48627200000000</d=
> iv><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 22 =C2=A0 =C2=A0 =C2=
> =A0 =C2=A0 765 =C2=A0 3.41105000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =
> =C2=A0 =C2=A0 =C2=A0 23 =C2=A0 =C2=A0 =C2=A0 =C2=A02292 =C2=A0
> 2.6130920000=
> 0000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 24 =C2=A0 =C2=
> =A0 =C2=A0 =C2=A0 765 =C2=A0 3.45583100000000</div><div>=C2=A0rank =C2=A0 =
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 25 =C2=A0 =C2=A0 =C2=A0 =C2=A0 765 =C2=A0
> 3.564=
> 92300000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 26 =C2=
> =A0 =C2=A0 =C2=A0 =C2=A0 870 =C2=A0 4.03421000000000</div><div>=C2=A0rank =
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 27 =C2=A0 =C2=A0 =C2=A0 =C2=A02292 =C2=
> =A0 2.45449000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A0 28 =C2=A0 =C2=A0 =C2=A0 =C2=A02292 =C2=A0 2.57531800000000</div><div>=
> =C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 29 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A02292 =C2=A0 2.41837300000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =
> =C2=A0 =C2=A0 30 =C2=A0 =C2=A0 =C2=A0 =C2=A0 765 =C2=A0
> 3.47952800000000</d=
> iv><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 31 =C2=A0 =C2=A0 =C2=
> =A0 =C2=A0 765 =C2=A0 3.48015500000000</div><div>=C2=A0SUM(rank_i) =C2=A0 =
> =C2=A0 =C2=A0 65535</div><div>=C2=A0reduce_sum_i
> =C2=A02147450880</div><div=
> ><br></div>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br><br>Compiling the
> co=
> de so that the rank 0 process doesn't do any computation produces much
> =
> lower cumulative wait times for the other
> ranks:<br><br>~~~~~~~~~~~~~~~~~~~=
> ~~~~~~~~~~~~~~~~~~~~~<br>mpif90 -DRANK0_POLL=C2=A0-o test_fetch_and_op
> test=
>
> _fetch_and_op.f90<br><div>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br><br><=
>
> div>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br><div>=C2=A0minval(A(1,1)[:]=
> ) =3D =C2=A0 3.920868194323862E-007</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=
> =A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0
> =C2=A00.0000=
> 00000000000E+000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
> =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A02812 =C2=A00.122509000000000</div><div>=
> =C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02 =C2=A0 =C2=A0 =C2=A0
> =
> =C2=A03824 =C2=A00.175148000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=
> =A0 =C2=A0 =C2=A0 =C2=A03 =C2=A0 =C2=A0 =C2=A0 =C2=A01051
> =C2=A05.798800000=
> 000000E-002</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
> =C2=A04=
> =C2=A0 =C2=A0 =C2=A0 =C2=A01305
> =C2=A04.384800000000000E-002</div><div>=C2=
> =A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A05 =C2=A0 =C2=A0 =C2=A0
> =C2=
> =A01041 =C2=A05.300300000000000E-002</div><div>=C2=A0rank =C2=A0 =C2=A0
> =C2=
> =A0 =C2=A0 =C2=A0 =C2=A06 =C2=A0 =C2=A0 =C2=A0 =C2=A01585
> =C2=A04.846100000=
> 000000E-002</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
> =C2=A07=
> =C2=A0 =C2=A0 =C2=A0 =C2=A01055
> =C2=A04.719500000000000E-002</div><div>=C2=
> =A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A08 =C2=A0 =C2=A0 =C2=A0
> =C2=
> =A01329 =C2=A04.898600000000000E-002</div><div>=C2=A0rank =C2=A0 =C2=A0
> =C2=
> =A0 =C2=A0 =C2=A0 =C2=A09 =C2=A0 =C2=A0 =C2=A0 =C2=A03836
> =C2=A00.158367000=
> 000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 10 =C2=A0 =
> =C2=A0 =C2=A0 =C2=A01052 =C2=A05.660700000000000E-002</div><div>=C2=A0rank
> =
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 11 =C2=A0 =C2=A0 =C2=A0 =C2=A04087 =C2=
> =A08.280400000000000E-002</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0
> =
> =C2=A0 12 =C2=A0 =C2=A0 =C2=A0 =C2=A01488
> =C2=A05.919000000000000E-002</div=
> ><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 13 =C2=A0 =C2=A0
> =C2=A0=
> =C2=A04076 =C2=A00.109046000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=
> =A0 =C2=A0 =C2=A0 14 =C2=A0 =C2=A0 =C2=A0 =C2=A01008
> =C2=A05.41640000000000=
> 0E-002</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 15 =C2=A0 =
> =C2=A0 =C2=A0 =C2=A01487 =C2=A05.333600000000000E-002</div><div>=C2=A0rank
> =
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 16 =C2=A0 =C2=A0 =C2=A0 =C2=A02142 =C2=
> =A00.137507000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A0 17 =C2=A0 =C2=A0 =C2=A0 =C2=A01553 =C2=A00.117416000000000</div><div>=
> =C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 18 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A02175 =C2=A00.138576000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =
> =C2=A0 =C2=A0 19 =C2=A0 =C2=A0 =C2=A0 =C2=A02592
> =C2=A00.187258000000000</d=
> iv><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 20 =C2=A0 =C2=A0 =C2=
> =A0 =C2=A02102 =C2=A00.136301000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =
> =C2=A0 =C2=A0 =C2=A0 21 =C2=A0 =C2=A0 =C2=A0 =C2=A02174
> =C2=A00.13667100000=
> 0000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 22 =C2=A0 =C2=
> =A0 =C2=A0 =C2=A02013 =C2=A00.166204000000000</div><div>=C2=A0rank =C2=A0 =
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 23 =C2=A0 =C2=A0 =C2=A0 =C2=A02531
> =C2=A00.1952=
> 96000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 24 =C2=
> =A0 =C2=A0 =C2=A0 =C2=A02057 =C2=A00.147237000000000</div><div>=C2=A0rank =
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 25 =C2=A0 =C2=A0 =C2=A0 =C2=A02127 =C2=
> =A00.138742000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A0 26 =C2=A0 =C2=A0 =C2=A0 =C2=A02591 =C2=A00.193355000000000</div><div>=
> =C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 27 =C2=A0 =C2=A0 =C2=A0 =C2=
> =A02151 =C2=A00.156355000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =
> =C2=A0 =C2=A0 28 =C2=A0 =C2=A0 =C2=A0 =C2=A02009
> =C2=A00.171320000000000</d=
> iv><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 29 =C2=A0 =C2=A0 =C2=
> =A0 =C2=A02121 =C2=A00.140748000000000</div><div>=C2=A0rank =C2=A0 =C2=A0 =
> =C2=A0 =C2=A0 =C2=A0 30 =C2=A0 =C2=A0 =C2=A0 =C2=A01573
> =C2=A00.11884500000=
> 0000</div><div>=C2=A0rank =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 31 =C2=A0 =C2=
> =A0 =C2=A0 =C2=A02588 =C2=A00.159494000000000</div><div>=C2=A0SUM(rank_i) =
> =C2=A0 =C2=A0 =C2=A0 65535</div><div>=C2=A0reduce_sum_i
> =C2=A02147450880</d=
>
> iv><div>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~</div><div><br></div><div>A=
> ny advice would be appreciated. Thanks!</div><div><br></div>--<br>Nathan
> We=
> eks<br>Systems Analyst<br>Iowa State University -- Department of
> Mathematic=
> s<br><a href=3D"http://weeks.public.iastate.edu/">
> http://weeks.public.iasta=
> te.edu/</a></div></div></div>
>
> --001a11c33fd69032f905223ac448--
>
> --===============7014773323673982084==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> --===============7014773323673982084==--
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151016/9a0ac14e/attachment-0001.html>
More information about the mvapich-discuss
mailing list