[mvapich-discuss] MPI_Reduce(MPI_SUM) order

Hari Subramoni subramoni.1 at osu.edu
Fri Dec 4 17:58:28 EST 2015


Hello,

If the configuration chosen to run the MPI job is retained then MVAPICH2
retains the order of operations and hence no non-determinism exists. This
makes reduction operations bitwise reproducible.

However, the same guarantees can't be made if the job is first run as 2
nodes with 4 processes per node and then as 4 nodes with 2 processes per
node. Further, no guarantees can be made across multiple sets of machines
due to the inherent non-determinism related to the results of floating
point operations at very high precision.

Hope this helps.

Regards,
Hari.

On Fri, Dec 4, 2015 at 4:11 AM, Rutger Hofman <rutger at cs.vu.nl> wrote:

> Good morning,
>
> my application uses MVapich2 (locally labeled mvapich2/gcc/64/2.0b) over
> Infiniband in a CentOS cluster. I notice the following. When I repeatedly
> run the application, the result of an MPI_Reduce(..., MPI_FLOAT, ...,
> MPI_SUM) over an array of floats may be different over various runs,
> although the inputs are exactly the same (I checked the bit patterns of the
> floats), the number of machines is the same, etc etc. The actual machines
> allocated, and the connection to the switches, may be different over runs
> -- I didn't try to fix the machine allocation within the cluster. The
> difference in the reduce results is at most small, in the order of
> magnitude that one would expect if the summation is carried out in a
> different order.
>
> My question: is it possible with MVapich2 that the internal order of the
> reduce operations is different, even if the number of machines is equal? Is
> it easy/possible to enforce a fixed order in the reduce implementation,
> just to verify this? Or should I suspect some bug of my own, like some
> weird memory corruption? My application also uses RDMA verbs natively; in
> principle that should work fine.
>
> Thank you for your advice,
>
> Rutger Hofman
> VU Amsterdam DAS5 http://www.cs.vu.nl/das5
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151204/06331845/attachment.html>


More information about the mvapich-discuss mailing list