[Mvapich-discuss] OSU Micro-Benchmarks reduction result validation and expected value
Giorgos Katevainis
gkatev at ics.forth.gr
Fri Nov 5 08:48:25 EDT 2021
Hello,
I was running an osu_allreduce test with validation enabled (-c 1) and encountered some failures.
Digging around the code and the validation function(s), I noticed a possible problem/bug.
( I hope I'm sending this to the correct list, please correct me if I'm not :) )
In osu_util_mpi.c, in set_buffer_float(), each rank's _i_ item is set to:
(i + 1) * (iter + 1) * 1.0 (let us call this x)
In validate_reduction(), the expected result of the reduction is:
(i + 1) * (iter + 1) * 1.0 * num_procs
My concern here is that (assuming I understand floats correctly) (( x * num_procs )) is different
from (( sum of x from 1 to num_procs )), due to float addition errors. Instead, I would probably
propose something like this for the expected value:
expected_buffer[i] = 0;
for(k = 0; k < num_procs; k++)
expected_buffer[i] += (i + 1) * (iter + 1) * 1.0;
I'm not sure if the current technique is by design or if this detail was overlooked. A case could
probably be made for either option's suitability.
More information about the Mvapich-discuss
mailing list