[mvapich-discuss] collectives fail under mvapich2-1.0 (fwd)
Ralf Reussner
reussner at ipd.uka.de
Thu Oct 4 05:20:30 EDT 2007
Dear Wei Huang,
thanks for your report. We will have a look into this. Please include
instead of my email adress the email of our developer & maintainer team
skampi at ira.uka.de in the discussions, so that all of us are informend on
the discussion.
Best regards
Ralf
> Hi Ed,
>
> We look into the problem more wrt one sided issues. However, we don't see
> the program hang in the MPI library. Actually the program is not hanging.
> But somehow for MPI_Win_test, we find the following code:
>
> if (get_measurement_rank() == 0) {
> reduced_group = exclude_rank_from_group(0, onesided_group);
> mpiassert = extract_onesided_assertions(assertion, "MPI_Win_post");
> MPI_Win_post(reduced_group, mpiassert, onesided_win);
>
> start_time = start_synchronization();
> MPI_Win_test(onesided_win, &flag);
> end_time = stop_synchronization();
> if (flag == 0)
> MPI_Win_wait(onesided_win);
> }
> else {
> reduced_group = exclude_all_ranks_except_from_group(0, onesided_group);
> mpiassert = extract_onesided_assertions(assertion, "MPI_Win_start");
> MPI_Win_start(reduced_group, mpiassert, onesided_win);
> if (do_a_put)
> MPI_Put(get_send_buffer(), count, datatype, 0, get_measurement_rank(),
> count, datatype, onesided_win);
> MPI_Win_complete(onesided_win);
> start_synchronization();
> stop_synchronization();
> }
>
> And the test is spending more and more in in start_synchronization(),
> which seems to calculate a certain timestamp, and busily reads wtime()
> until we reach that timestamp. We find that start_synchronization() is
> taking longer and longer time, and finally will spend tens of seconds
> before it returns. We are not sure how the timestamp is calculated, so we
> cc this email to SkaMPI team and hope they can give some insights here.
>
> Dear SkaMPI team, we face a problem running SkaMPI using mvapich2-1.0 on
> 12 processes (3 nodes, 4 processes each, block distribution). We find that
> start_synchronization() in MPI_Win_test is taking very long time to return
> as the test goes on. As a result, the test appears to be hang. We are not
> sure how the timestamp is calculated and how you adjust this value. Could
> you please help give some insights here?
>
> Thanks.
>
> Regards,
> Wei Huang
>
> 774 Dreese Lab, 2015 Neil Ave,
> Dept. of Computer Science and Engineering
> Ohio State University
> OH 43210
> Tel: (614)292-8501
>
>
> On Thu, 27 Sep 2007, Edmund Sumbar wrote:
>
>
>> Edmund Sumbar wrote:
>>
>>> I'll try running the SKaMPI tests again. Maybe
>>> I missed something, as with the mvapich2 tests.
>>>
>> I recompiled and reran SKaMPI pt2pt, coll,
>> onesided, and mmisc tests on 3 nodes, 4
>> processors per node.
>>
>> pt2pt and mmisc succeeded, while coll and
>> onesided failed (stalled). Any ideas?
>>
>> For what it's worth, here are the tails of
>> the output files...
>>
>>
>> $ tail coll_ib-3x4.sko
>> # SKaMPI Version 5.0 rev. 191
>>
>> begin result "MPI_Bcast-nodes-short"
>> nodes= 2 1024 3.8 0.2 39 2.9 3.6
>> nodes= 3 1024 6.6 0.4 38 4.0 6.3 4.9
>> nodes= 4 1024 9.2 0.2 32 4.6 7.7 7.6 8.6
>>
>>
>> $ tail onesided_ib-3x4.sko
>> cpus= 8 4 50051.7 1.3 8 50051.7 --- --- --- --- --- --- ---
>> cpus= 9 4 50051.5 0.7 8 50051.5 --- --- --- --- --- --- --- ---
>> cpus= 10 4 50047.7 1.6 8 50047.7 --- --- --- --- --- --- ---
>> --- ---
>> cpus= 11 4 50058.2 2.7 8 50058.2 --- --- --- --- --- --- ---
>> --- --- ---
>> cpus= 12 4 50074.3 2.8 8 50074.3 --- --- --- --- --- --- ---
>> --- --- --- ---
>> end result "MPI_Win_wait delayed,small"
>> # duration = 9.00 sec
>>
>> begin result "MPI_Win_wait delayed without MPI_Put"
>> cpus= 2 1048576 50025.0 1.4 8 50025.0 ---
>>
>>
>> --
>> Ed[mund [Sumbar]]
>> AICT Research Support, Univ of Alberta
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
--
--------------------------------------------------------------
Prof. Dr. Ralf Reussner - Chair Software-Design and -Quality
Institute for Program Structures and Data Organization
Faculty of Informatics, Universitaet Karlsruhe (TH)
Am Fasanengarten 5, D-76131 Karlsruhe, Germany
Office 327, Main Computer Science Building (50.34)
Tel. +49 721 608 5993, Fax. +49 721 608 5990
http://sdq.ipd.uka.de
--------------------------------------------------------------
More information about the mvapich-discuss
mailing list