[Mvapich-discuss] Possible buffer overflow for large messages?

John Moore john at flexcompute.com
Wed Sep 28 17:43:58 EDT 2022


Hi Hari,

After some further investigation today, I found that even if I split up the
Send/Receives into smaller message sizes and passed in an offset pointer to
the dataBuffer I still ended up with the incorrect result for the second
data object, which is really strange. Even for message sizes as small as
1MB, eventually the resulting data was incorrect the second time around.

So I tried staging the data first in a smaller buffer, of size < 2GB, which
MPI_Send/MPI_Rect directly used. Then I memcpy'd the data from these
staging buffers into the global data, which is a block of memory of size
8GB. So perhaps there is some bug in the registration of large data
pointers, in excess of 2 or 4GB?

Best Regards,
John

On Wed, Sep 28, 2022 at 2:52 PM John Moore <john at flexcompute.com> wrote:

> Hi Hari,
>
> Thank you,
>
> It is actually rather hard to reproduce in a standalone example, in my
> experience. I wrote a simple standalone example with the same partitioning
> as the actual case, and was not able to reproduce it. We have several
> MPI_Gatherv calls, that operate on the same size of distributed data, and
> we are allocating memory to store the gathered data.
>
> Interestingly, on the first data object that we gather, the result is
> correct, but after we allocate memory to store the result elsewhere and
> communicate a second data object representing the exact same data, we get
> an incorrect result for the gatherv.
>
>
>
> On Wed, Sep 28, 2022 at 2:41 PM Subramoni, Hari <subramoni.1 at osu.edu>
> wrote:
>
>> Hi, John.
>>
>>
>>
>> Sorry to hear that you’re facing issues. Let us try this out internally
>> and get back to you shortly.
>>
>>
>>
>> Thx,
>>
>> Hari.
>>
>>
>>
>> *From:* Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> *On
>> Behalf Of *John Moore via Mvapich-discuss
>> *Sent:* Wednesday, September 28, 2022 1:38 PM
>> *To:* mvapich-discuss at lists.osu.edu
>> *Subject:* [Mvapich-discuss] Possible buffer overflow for large messages?
>>
>>
>>
>> Hello, We have a code that does a large Gatherv operation, where the size
>> of the gathered message > 4GB. It is approximately 8GB. We have noticed
>> that the result of the gatherv operation is incorrect for these large
>> calls. The sizes that
>>
>> Hello,
>>
>>
>>
>> We have a code that does a large Gatherv operation, where the size of the
>> gathered message > 4GB. It is approximately 8GB. We have noticed that the
>> result of the gatherv operation is incorrect for these large calls. The
>> sizes that we are passing into Gatherv are all within the int limit, and we
>> are using custom data types (MPI_Type_Contiguous) to allow for this larger
>> message size.
>>
>>
>>
>> We have also tried replacing the Gatherv call with Isend/Irecv calls,
>> which are all within the int representation range in terms of the number of
>> bytes communicated, with the same incorrect result.
>>
>>
>>
>> When we compile with OpenMPI, the result is correct. Also, when we run
>> the operations on smaller data sets with MVAPICH2 the result is correct.
>>
>> This job is being run across two nodes with 16 ranks total (8 ranks each)
>> When we place all the data on a single node, and use the same input data
>> and number of ranks, we again get the correct result. This leads me to
>> believe that some remote send/receive buffer is being exceeded.
>>
>>
>>
>> We are running MVAPICH2-GDR-2.3.6, but these buffers are all CPU buffers,
>> and we are running this executable with MV2_USE_CUDA=0. Perhaps there are
>> some environmental variables to change here? Any advice would be greatly
>> appreciated.
>>
>>
>>
>> Thank you,
>>
>> John
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20220928/f8b331b0/attachment-0014.html>


More information about the Mvapich-discuss mailing list