[Mvapich-discuss] OSU Micro-Benchmarks reduction result validation and expected value

Subramoni, Hari subramoni.1 at osu.edu
Fri Nov 5 11:26:02 EDT 2021


Hi, Giorgos.

Thanks for the report. We appreciate it. We will take a look at it and get back to you soon.

Best,
Hari.

-----Original Message-----
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> On Behalf Of Giorgos Katevainis via Mvapich-discuss
Sent: Friday, November 5, 2021 8:48 AM
To: mvapich-discuss at lists.osu.edu
Subject: [Mvapich-discuss] OSU Micro-Benchmarks reduction result validation and expected value

Hello,

I was running an osu_allreduce test with validation enabled (-c 1) and encountered some failures.
Digging around the code and the validation function(s), I noticed a possible problem/bug.

( I hope I'm sending this to the correct list, please correct me if I'm not :) )


In osu_util_mpi.c, in set_buffer_float(), each rank's _i_ item is set to:

(i + 1) * (iter + 1) * 1.0 (let us call this x)

In validate_reduction(), the expected result of the reduction is:

(i + 1) * (iter + 1) * 1.0 * num_procs

My concern here is that (assuming I understand floats correctly) (( x * num_procs )) is different from (( sum of x from 1 to num_procs )), due to float addition errors. Instead, I would probably propose something like this for the expected value:

expected_buffer[i] = 0;
	for(k = 0; k < num_procs; k++)
		expected_buffer[i] += (i + 1) * (iter + 1) * 1.0;

I'm not sure if the current technique is by design or if this detail was overlooked. A case could probably be made for either option's suitability.


_______________________________________________
Mvapich-discuss mailing list
Mvapich-discuss at lists.osu.edu
https://lists.osu.edu/mailman/listinfo/mvapich-discuss



More information about the Mvapich-discuss mailing list