[mvapich-discuss] MPI_Reduce(MPI_SUM) order

Hari Subramoni subramoni.1 at osu.edu
Mon Dec 7 10:11:48 EST 2015


Hi,

I tried a similar program locally and was not able to see the issue you
mentioned. We did not see any validation errors. Could you please share
your reproducer with us so that we can try that out also?

Thx,
Hari,

On Mon, Dec 7, 2015 at 4:30 AM, Rutger Hofman <rutger at cs.vu.nl> wrote:

> Update: when run on 3 machines, iteration 2340 (counting starts at 0)
> gives a different result. On 4 machines, iteration 16 gives a different
> result, the same as 5 machines. On 2 machines, I ran 1000000 iterations
> without error.
>
> Rutger Hofman
> VU Amsterdam
> http://www.cs.vu.nl/das5
>
>
> On 12/07/2015 09:55 AM, Rutger Hofman wrote:
>
>> I wrote a little stand-alone C++ program to try and narrow the issue. It
>> performs a tight loop of MPI_Reduce(float[K], ... MPI_SUM, ...) with
>> K=64, invoked with identical parameters of well-formed floats. It
>> compares the result of later iterations with the first result. Since
>> MPI_Reduce is deterministic in its spanning tree, these should be
>> bit-identical.
>>
>> The program ran as an application on 5 machines, one thread/process per
>> machine. After 16 iterations, the result differs from the first result.
>> Similar to my issue reported below, the difference seems approx. in the
>> 7th significant digit in quite a number of array fields -- this might
>> even count for 'floating point correct', but I suspect that is an
>> artifact rather than a feature.
>>
>> Conclusion: MPI_Reduce is /not/ deterministic, even within one run.
>> Since you explain that it should be deterministic, my guess is that some
>> interal MVapich2 state gets corrupted (and I don't see reasons to
>> primarily suspect the spanning tree).
>>
>> Should I post my code for ease of debugging? Are there other things that
>> I can do?
>>
>> Rutger
>>
>> On 12/04/2015 11:58 PM, Hari Subramoni wrote:
>>
>>> Hello,
>>>
>>> If the configuration chosen to run the MPI job is retained then MVAPICH2
>>> retains the order of operations and hence no non-determinism exists.
>>> This makes reduction operations bitwise reproducible.
>>>
>>> However, the same guarantees can't be made if the job is first run as 2
>>> nodes with 4 processes per node and then as 4 nodes with 2 processes per
>>> node. Further, no guarantees can be made across multiple sets of
>>> machines due to the inherent non-determinism related to the results of
>>> floating point operations at very high precision.
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>> Hari.
>>>
>>> On Fri, Dec 4, 2015 at 4:11 AM, Rutger Hofman <rutger at cs.vu.nl
>>> <mailto:rutger at cs.vu.nl>> wrote:
>>>
>>>     Good morning,
>>>
>>>     my application uses MVapich2 (locally labeled mvapich2/gcc/64/2.0b)
>>>     over Infiniband in a CentOS cluster. I notice the following. When I
>>>     repeatedly run the application, the result of an MPI_Reduce(...,
>>>     MPI_FLOAT, ..., MPI_SUM) over an array of floats may be different
>>>     over various runs, although the inputs are exactly the same (I
>>>     checked the bit patterns of the floats), the number of machines is
>>>     the same, etc etc. The actual machines allocated, and the connection
>>>     to the switches, may be different over runs -- I didn't try to fix
>>>     the machine allocation within the cluster. The difference in the
>>>     reduce results is at most small, in the order of magnitude that one
>>>     would expect if the summation is carried out in a different order.
>>>
>>>     My question: is it possible with MVapich2 that the internal order of
>>>     the reduce operations is different, even if the number of machines
>>>     is equal? Is it easy/possible to enforce a fixed order in the reduce
>>>     implementation, just to verify this? Or should I suspect some bug of
>>>     my own, like some weird memory corruption? My application also uses
>>>     RDMA verbs natively; in principle that should work fine.
>>>
>>>     Thank you for your advice,
>>>
>>>     Rutger Hofman
>>>     VU Amsterdam DAS5 http://www.cs.vu.nl/das5
>>>     _______________________________________________
>>>     mvapich-discuss mailing list
>>>     mvapich-discuss at cse.ohio-state.edu
>>>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>>>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151207/4fdd6966/attachment-0001.html>


More information about the mvapich-discuss mailing list