[Mvapich-discuss] OSU Micro-Benchmarks reduction result validation and expected value
Subramoni, Hari
subramoni.1 at osu.edu
Fri Nov 5 11:26:02 EDT 2021
Hi, Giorgos.
Thanks for the report. We appreciate it. We will take a look at it and get back to you soon.
Best,
Hari.
-----Original Message-----
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> On Behalf Of Giorgos Katevainis via Mvapich-discuss
Sent: Friday, November 5, 2021 8:48 AM
To: mvapich-discuss at lists.osu.edu
Subject: [Mvapich-discuss] OSU Micro-Benchmarks reduction result validation and expected value
Hello,
I was running an osu_allreduce test with validation enabled (-c 1) and encountered some failures.
Digging around the code and the validation function(s), I noticed a possible problem/bug.
( I hope I'm sending this to the correct list, please correct me if I'm not :) )
In osu_util_mpi.c, in set_buffer_float(), each rank's _i_ item is set to:
(i + 1) * (iter + 1) * 1.0 (let us call this x)
In validate_reduction(), the expected result of the reduction is:
(i + 1) * (iter + 1) * 1.0 * num_procs
My concern here is that (assuming I understand floats correctly) (( x * num_procs )) is different from (( sum of x from 1 to num_procs )), due to float addition errors. Instead, I would probably propose something like this for the expected value:
expected_buffer[i] = 0;
for(k = 0; k < num_procs; k++)
expected_buffer[i] += (i + 1) * (iter + 1) * 1.0;
I'm not sure if the current technique is by design or if this detail was overlooked. A case could probably be made for either option's suitability.
_______________________________________________
Mvapich-discuss mailing list
Mvapich-discuss at lists.osu.edu
https://lists.osu.edu/mailman/listinfo/mvapich-discuss
More information about the Mvapich-discuss
mailing list