[mvapich-discuss] collectives fail under mvapich2-1.0 (fwd)

wei huang huanwei at cse.ohio-state.edu
Tue Oct 2 11:34:11 EDT 2007


Hi Ed,

We look into the problem more wrt one sided issues. However, we don't see
the program hang in the MPI library. Actually the program is not hanging.
But somehow for MPI_Win_test, we find the following code:

  if (get_measurement_rank() == 0) {
    reduced_group = exclude_rank_from_group(0, onesided_group);
    mpiassert = extract_onesided_assertions(assertion, "MPI_Win_post");
    MPI_Win_post(reduced_group, mpiassert, onesided_win);

    start_time = start_synchronization();
    MPI_Win_test(onesided_win, &flag);
    end_time = stop_synchronization();
    if (flag == 0)
      MPI_Win_wait(onesided_win);
  }
  else {
    reduced_group = exclude_all_ranks_except_from_group(0, onesided_group);
    mpiassert = extract_onesided_assertions(assertion, "MPI_Win_start");
    MPI_Win_start(reduced_group, mpiassert, onesided_win);
    if (do_a_put)
      MPI_Put(get_send_buffer(), count, datatype, 0, get_measurement_rank(),
              count, datatype, onesided_win);
    MPI_Win_complete(onesided_win);
    start_synchronization();
    stop_synchronization();
  }

And the test is spending more and more in in start_synchronization(),
which seems to calculate a certain timestamp, and busily reads wtime()
until we reach that timestamp. We find that start_synchronization() is
taking longer and longer time, and finally will spend tens of seconds
before it returns. We are not sure how the timestamp is calculated, so we
cc this email to SkaMPI team and hope they can give some insights here.

Dear SkaMPI team, we face a problem running SkaMPI using mvapich2-1.0 on
12 processes (3 nodes, 4 processes each, block distribution). We find that
start_synchronization() in MPI_Win_test is taking very long time to return
as the test goes on. As a result, the test appears to be hang. We are not
sure how the timestamp is calculated and how you adjust this value. Could
you please help give some insights here?

Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501


On Thu, 27 Sep 2007, Edmund Sumbar wrote:

> Edmund Sumbar wrote:
> > I'll try running the SKaMPI tests again.  Maybe
> > I missed something, as with the mvapich2 tests.
>
> I recompiled and reran SKaMPI pt2pt, coll,
> onesided, and mmisc tests on 3 nodes, 4
> processors per node.
>
> pt2pt and mmisc succeeded, while coll and
> onesided failed (stalled).  Any ideas?
>
> For what it's worth, here are the tails of
> the output files...
>
>
> $ tail coll_ib-3x4.sko
> # SKaMPI Version 5.0 rev. 191
>
> begin result "MPI_Bcast-nodes-short"
> nodes= 2     1024       3.8       0.2       39       2.9       3.6
> nodes= 3     1024       6.6       0.4       38       4.0       6.3       4.9
> nodes= 4     1024       9.2       0.2       32       4.6       7.7       7.6       8.6
>
>
> $ tail onesided_ib-3x4.sko
> cpus= 8        4   50051.7       1.3        8   50051.7    ---       ---       ---       ---       ---       ---       ---
> cpus= 9        4   50051.5       0.7        8   50051.5    ---       ---       ---       ---       ---       ---       ---       ---
> cpus= 10        4   50047.7       1.6        8   50047.7    ---       ---       ---       ---       ---       ---       ---
> ---       ---
> cpus= 11        4   50058.2       2.7        8   50058.2    ---       ---       ---       ---       ---       ---       ---
> ---       ---       ---
> cpus= 12        4   50074.3       2.8        8   50074.3    ---       ---       ---       ---       ---       ---       ---
> ---       ---       ---       ---
> end result "MPI_Win_wait delayed,small"
> # duration = 9.00 sec
>
> begin result "MPI_Win_wait delayed without MPI_Put"
> cpus= 2  1048576   50025.0       1.4        8   50025.0    ---
>
>
> --
> Ed[mund [Sumbar]]
> AICT Research Support, Univ of Alberta
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list