[Mvapich-discuss] Patch for large one-sided operations.

Subramoni, Hari subramoni.1 at osu.edu
Wed Aug 16 07:58:31 EDT 2023


Hi, Luke/Shi.

Many thanks for reporting the issue and providing the patch. We will take a look at it and take it into the code base with an acknowledgement to both of you.

Best,
Hari.

-----Original Message-----
From: Mvapich-discuss <mvapich-discuss-bounces+subramon=cse.ohio-state.edu at lists.osu.edu> On Behalf Of Robison, Luke via Mvapich-discuss
Sent: Monday, August 14, 2023 3:31 PM
To: mvapich-discuss at lists.osu.edu
Cc: Jin, Shi <sjina at amazon.com>
Subject: [Mvapich-discuss] Patch for large one-sided operations.

!-------------------------------------------------------------------|
  This Message Is From an External Sender
  This message came from outside your organization.
|-------------------------------------------------------------------!

Hello,

Please consider the following patch to address a stack overflow during large one-sided validation.  It should apply cleanly to the 7.2.0 tarball.  You can test by validating a particularly large message size such as with:
    ./c/mpi/one-sided/osu_acc_latency -c -m 268435456

Thanks,
Luke Robison

-----------8<---------------------

Author: Shi Jin <sjina at amazon.com>
Date:   Mon Aug 14 19:14:27 2023 +0000

    osu_util_validation.c: Allocate buffer from heap

    Currently, local_addr_in_sysmem and local_result_in_sysmem
    are allocated in stack, which can cause stack overflow when
    buf_size is large. This patch fixes this issue by allocating
    them from heap.

    Signed-off-by: Shi Jin <sjina at amazon.com>
    Signed-off-by: Luke Robison <lrbison at amazon.com>

diff --git a/c/util/osu_util_validation.c b/c/util/osu_util_validation.c index c2cd1fa..722d541 100644
--- a/c/util/osu_util_validation.c
+++ b/c/util/osu_util_validation.c
@@ -418,8 +418,8 @@ int atomic_data_validation_check(MPI_Datatype datatype, MPI_Op op, int jrank,
     char expected_local_addr[MAX_ATOM_BYTES], dummy_remote_addr[MAX_ATOM_BYTES];
     char expected_local_result[MAX_ATOM_BYTES];

-    char local_addr_in_sysmem[buf_size];
-    char local_result_in_sysmem[buf_size];
+    char *local_addr_in_sysmem = NULL;
+    char *local_result_in_sysmem = NULL;
     int dtype_size;
     size_t natoms;
     int jatom;
@@ -469,6 +469,19 @@ int atomic_data_validation_check(MPI_Datatype datatype, MPI_Op op, int jrank,
         goto error;

     err = 0;
+    local_addr_in_sysmem = malloc(buf_size);
+    if (!local_addr_in_sysmem) {
+        fprintf(stderr, "Failed to allocate local_addr_in_sysmem buffer of size %lu\n", buf_size);
+        err = -1;
+        goto error;
+    }
+    local_result_in_sysmem = malloc(buf_size);
+    if (!local_result_in_sysmem) {
+        fprintf(stderr, "Failed to allocate local_result_in_sysmem buffer of size %lu\n", buf_size);
+        free(local_addr_in_sysmem);
+        err = -1;
+        goto error;
+    }
     if (check_addr)
         err |= get_hmem_buffer(local_addr_in_sysmem, addr, buf_size);
     if (check_result)
@@ -516,6 +529,8 @@ int atomic_data_validation_check(MPI_Datatype datatype, MPI_Op op, int jrank,
     atomic_dv_record(datatype, op, any_errors, 1);
     if (any_errors)
         *validation_results |= 1;
+    free(local_addr_in_sysmem);
+    free(local_result_in_sysmem);
     return 0;

 nocheck:
_______________________________________________
Mvapich-discuss mailing list
Mvapich-discuss at lists.osu.edu
https://lists.osu.edu/mailman/listinfo/mvapich-discuss



More information about the Mvapich-discuss mailing list