[mvapich-discuss] MPI_Reduce(MPI_SUM) order

John Donners john.donners at surfsara.nl
Mon Dec 7 04:42:30 EST 2015


Hi Rutger,

I noticed that printing floats could give varying results (in the last 
decimal) with
the same data, probably depending on the alignment of the (internal) 
variable
that is printed. Have you checked that it is really the data and not the 
output
that is different? A solution would be to print the hexadecimal 
representation of your data.

With regards,
John

On 07-12-15 10:30, Rutger Hofman wrote:
> Update: when run on 3 machines, iteration 2340 (counting starts at 0) 
> gives a different result. On 4 machines, iteration 16 gives a 
> different result, the same as 5 machines. On 2 machines, I ran 1000000 
> iterations without error.
>
> Rutger Hofman
> VU Amsterdam
> http://www.cs.vu.nl/das5
>
> On 12/07/2015 09:55 AM, Rutger Hofman wrote:
>> I wrote a little stand-alone C++ program to try and narrow the issue. It
>> performs a tight loop of MPI_Reduce(float[K], ... MPI_SUM, ...) with
>> K=64, invoked with identical parameters of well-formed floats. It
>> compares the result of later iterations with the first result. Since
>> MPI_Reduce is deterministic in its spanning tree, these should be
>> bit-identical.
>>
>> The program ran as an application on 5 machines, one thread/process per
>> machine. After 16 iterations, the result differs from the first result.
>> Similar to my issue reported below, the difference seems approx. in the
>> 7th significant digit in quite a number of array fields -- this might
>> even count for 'floating point correct', but I suspect that is an
>> artifact rather than a feature.
>>
>> Conclusion: MPI_Reduce is /not/ deterministic, even within one run.
>> Since you explain that it should be deterministic, my guess is that some
>> interal MVapich2 state gets corrupted (and I don't see reasons to
>> primarily suspect the spanning tree).
>>
>> Should I post my code for ease of debugging? Are there other things that
>> I can do?
>>
>> Rutger
>>
>> On 12/04/2015 11:58 PM, Hari Subramoni wrote:
>>> Hello,
>>>
>>> If the configuration chosen to run the MPI job is retained then 
>>> MVAPICH2
>>> retains the order of operations and hence no non-determinism exists.
>>> This makes reduction operations bitwise reproducible.
>>>
>>> However, the same guarantees can't be made if the job is first run as 2
>>> nodes with 4 processes per node and then as 4 nodes with 2 processes 
>>> per
>>> node. Further, no guarantees can be made across multiple sets of
>>> machines due to the inherent non-determinism related to the results of
>>> floating point operations at very high precision.
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>> Hari.
>>>
>>> On Fri, Dec 4, 2015 at 4:11 AM, Rutger Hofman <rutger at cs.vu.nl
>>> <mailto:rutger at cs.vu.nl>> wrote:
>>>
>>>     Good morning,
>>>
>>>     my application uses MVapich2 (locally labeled mvapich2/gcc/64/2.0b)
>>>     over Infiniband in a CentOS cluster. I notice the following. When I
>>>     repeatedly run the application, the result of an MPI_Reduce(...,
>>>     MPI_FLOAT, ..., MPI_SUM) over an array of floats may be different
>>>     over various runs, although the inputs are exactly the same (I
>>>     checked the bit patterns of the floats), the number of machines is
>>>     the same, etc etc. The actual machines allocated, and the 
>>> connection
>>>     to the switches, may be different over runs -- I didn't try to fix
>>>     the machine allocation within the cluster. The difference in the
>>>     reduce results is at most small, in the order of magnitude that one
>>>     would expect if the summation is carried out in a different order.
>>>
>>>     My question: is it possible with MVapich2 that the internal 
>>> order of
>>>     the reduce operations is different, even if the number of machines
>>>     is equal? Is it easy/possible to enforce a fixed order in the 
>>> reduce
>>>     implementation, just to verify this? Or should I suspect some 
>>> bug of
>>>     my own, like some weird memory corruption? My application also uses
>>>     RDMA verbs natively; in principle that should work fine.
>>>
>>>     Thank you for your advice,
>>>
>>>     Rutger Hofman
>>>     VU Amsterdam DAS5 http://www.cs.vu.nl/das5
>>>     _______________________________________________
>>>     mvapich-discuss mailing list
>>>     mvapich-discuss at cse.ohio-state.edu
>>>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


-- 
SURFdrive: de persoonlijke cloudopslagdienst voor het Nederlandse hoger onderwijs en onderzoek.

| John Donners | Senior adviseur | Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG Amsterdam | Nederland |
T (31)6 19039023 | john.donners at surfsara.nl | www.surfsara.nl |

Aanwezig op | ma | di | wo | do | vr



More information about the mvapich-discuss mailing list