[Mvapich-discuss] OSU Micro-Benchmarks reduction result validation and expected value

Giorgos Katevainis gkatev at ics.forth.gr
Fri Nov 5 08:48:25 EDT 2021


Hello,

I was running an osu_allreduce test with validation enabled (-c 1) and encountered some failures.
Digging around the code and the validation function(s), I noticed a possible problem/bug.

( I hope I'm sending this to the correct list, please correct me if I'm not :) )


In osu_util_mpi.c, in set_buffer_float(), each rank's _i_ item is set to:

(i + 1) * (iter + 1) * 1.0 (let us call this x)

In validate_reduction(), the expected result of the reduction is:

(i + 1) * (iter + 1) * 1.0 * num_procs

My concern here is that (assuming I understand floats correctly) (( x * num_procs )) is different
from (( sum of x from 1 to num_procs )), due to float addition errors. Instead, I would probably
propose something like this for the expected value:

expected_buffer[i] = 0;
	for(k = 0; k < num_procs; k++)
		expected_buffer[i] += (i + 1) * (iter + 1) * 1.0;

I'm not sure if the current technique is by design or if this detail was overlooked. A case could
probably be made for either option's suitability.





More information about the Mvapich-discuss mailing list