[Mvapich-discuss] Patch for large one-sided operations.

Robison, Luke lrbison at amazon.com
Mon Aug 14 15:31:11 EDT 2023


!-------------------------------------------------------------------|
  This Message Is From an External Sender
  This message came from outside your organization.
|-------------------------------------------------------------------!

Hello,

Please consider the following patch to address a stack overflow during large one-sided validation.  It should apply cleanly to the 7.2.0 tarball.  You can test by validating a particularly large message size such as with:
    ./c/mpi/one-sided/osu_acc_latency -c -m 268435456

Thanks,
Luke Robison

-----------8<---------------------

Author: Shi Jin <sjina at amazon.com>
Date:   Mon Aug 14 19:14:27 2023 +0000

    osu_util_validation.c: Allocate buffer from heap

    Currently, local_addr_in_sysmem and local_result_in_sysmem
    are allocated in stack, which can cause stack overflow when
    buf_size is large. This patch fixes this issue by allocating
    them from heap.

    Signed-off-by: Shi Jin <sjina at amazon.com>
    Signed-off-by: Luke Robison <lrbison at amazon.com>

diff --git a/c/util/osu_util_validation.c b/c/util/osu_util_validation.c
index c2cd1fa..722d541 100644
--- a/c/util/osu_util_validation.c
+++ b/c/util/osu_util_validation.c
@@ -418,8 +418,8 @@ int atomic_data_validation_check(MPI_Datatype datatype, MPI_Op op, int jrank,
     char expected_local_addr[MAX_ATOM_BYTES], dummy_remote_addr[MAX_ATOM_BYTES];
     char expected_local_result[MAX_ATOM_BYTES];

-    char local_addr_in_sysmem[buf_size];
-    char local_result_in_sysmem[buf_size];
+    char *local_addr_in_sysmem = NULL;
+    char *local_result_in_sysmem = NULL;
     int dtype_size;
     size_t natoms;
     int jatom;
@@ -469,6 +469,19 @@ int atomic_data_validation_check(MPI_Datatype datatype, MPI_Op op, int jrank,
         goto error;

     err = 0;
+    local_addr_in_sysmem = malloc(buf_size);
+    if (!local_addr_in_sysmem) {
+        fprintf(stderr, "Failed to allocate local_addr_in_sysmem buffer of size %lu\n", buf_size);
+        err = -1;
+        goto error;
+    }
+    local_result_in_sysmem = malloc(buf_size);
+    if (!local_result_in_sysmem) {
+        fprintf(stderr, "Failed to allocate local_result_in_sysmem buffer of size %lu\n", buf_size);
+        free(local_addr_in_sysmem);
+        err = -1;
+        goto error;
+    }
     if (check_addr)
         err |= get_hmem_buffer(local_addr_in_sysmem, addr, buf_size);
     if (check_result)
@@ -516,6 +529,8 @@ int atomic_data_validation_check(MPI_Datatype datatype, MPI_Op op, int jrank,
     atomic_dv_record(datatype, op, any_errors, 1);
     if (any_errors)
         *validation_results |= 1;
+    free(local_addr_in_sysmem);
+    free(local_result_in_sysmem);
     return 0;

 nocheck:



More information about the Mvapich-discuss mailing list