[Mvapich-discuss] Patch for large one-sided operations.
Robison, Luke
lrbison at amazon.com
Mon Aug 14 15:31:11 EDT 2023
!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!
Hello,
Please consider the following patch to address a stack overflow during large one-sided validation. It should apply cleanly to the 7.2.0 tarball. You can test by validating a particularly large message size such as with:
./c/mpi/one-sided/osu_acc_latency -c -m 268435456
Thanks,
Luke Robison
-----------8<---------------------
Author: Shi Jin <sjina at amazon.com>
Date: Mon Aug 14 19:14:27 2023 +0000
osu_util_validation.c: Allocate buffer from heap
Currently, local_addr_in_sysmem and local_result_in_sysmem
are allocated in stack, which can cause stack overflow when
buf_size is large. This patch fixes this issue by allocating
them from heap.
Signed-off-by: Shi Jin <sjina at amazon.com>
Signed-off-by: Luke Robison <lrbison at amazon.com>
diff --git a/c/util/osu_util_validation.c b/c/util/osu_util_validation.c
index c2cd1fa..722d541 100644
--- a/c/util/osu_util_validation.c
+++ b/c/util/osu_util_validation.c
@@ -418,8 +418,8 @@ int atomic_data_validation_check(MPI_Datatype datatype, MPI_Op op, int jrank,
char expected_local_addr[MAX_ATOM_BYTES], dummy_remote_addr[MAX_ATOM_BYTES];
char expected_local_result[MAX_ATOM_BYTES];
- char local_addr_in_sysmem[buf_size];
- char local_result_in_sysmem[buf_size];
+ char *local_addr_in_sysmem = NULL;
+ char *local_result_in_sysmem = NULL;
int dtype_size;
size_t natoms;
int jatom;
@@ -469,6 +469,19 @@ int atomic_data_validation_check(MPI_Datatype datatype, MPI_Op op, int jrank,
goto error;
err = 0;
+ local_addr_in_sysmem = malloc(buf_size);
+ if (!local_addr_in_sysmem) {
+ fprintf(stderr, "Failed to allocate local_addr_in_sysmem buffer of size %lu\n", buf_size);
+ err = -1;
+ goto error;
+ }
+ local_result_in_sysmem = malloc(buf_size);
+ if (!local_result_in_sysmem) {
+ fprintf(stderr, "Failed to allocate local_result_in_sysmem buffer of size %lu\n", buf_size);
+ free(local_addr_in_sysmem);
+ err = -1;
+ goto error;
+ }
if (check_addr)
err |= get_hmem_buffer(local_addr_in_sysmem, addr, buf_size);
if (check_result)
@@ -516,6 +529,8 @@ int atomic_data_validation_check(MPI_Datatype datatype, MPI_Op op, int jrank,
atomic_dv_record(datatype, op, any_errors, 1);
if (any_errors)
*validation_results |= 1;
+ free(local_addr_in_sysmem);
+ free(local_result_in_sysmem);
return 0;
nocheck:
More information about the Mvapich-discuss
mailing list