[Mvapich-discuss] Excessive Memory Usage By Vbufs

Derek Gaston friedmud at gmail.com
Thu Jun 17 11:34:48 EDT 2021


Thank you both - please let me know if there is anything else we can do to
help track it down.

I also noticed that my table got destroyed by the mailing list - so I added
it as a comment at the bottom of the gist here:
https://urldefense.com/v3/__https://gist.github.com/friedmud/9533d5997f06414c25f8c5c57a1eaf37__;!!KGKeukY!hQh2EyE07odNeXyxCg5eI0YqkybFFUILpOUM0fp6ikGIydWfAhOn2Ed4PFOjBpaBb63wY4AMRQ$ 

As a related question: our application has a need to do this a _lot_ (lots
of small messages sent asynchronously).  All of the creation of temporary
buffers and copying of data seems like it may be slowing us down in
general.  Do you think it may be a good idea to just turn off (or way down)
eager sending all together?

Thanks again!

Derek

On Thu, Jun 17, 2021 at 9:05 AM Subramoni, Hari <subramoni.1 at osu.edu> wrote:

> Hi, Derek.
>
>
>
> Thanks for reporting this to us.
>
>
>
> There are a few potential solutions I can think of. Let us try this out
> and get back to you.
>
>
>
> Best,
>
> Hari.
>
>
>
> PS: The ID “mvapich-discuss at cse.ohio-state.edu” has been discontinued and
> replaced with mvapich-discuss at lists.osu.edu. That is probably why your
> previous e-mail bounced.
>
>
>
> *From:* Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> *On
> Behalf Of *Derek Gaston via Mvapich-discuss
> *Sent:* Thursday, June 17, 2021 10:42 AM
> *To:* mvapich-discuss at lists.osu.edu
> *Subject:* [Mvapich-discuss] Excessive Memory Usage By Vbufs
>
>
>
> I got an odd message that seemed like my email bounced.  So, just sending
> this again to make sure it goes through.
>
>
>
> ---------- Forwarded message ---------
> From: *Derek Gaston* <friedmud at gmail.com>
> Date: Wed, Jun 16, 2021 at 3:33 PM
> Subject: Excessive Memory Usage By Vbufs
> To: <mvapich-discuss at cse.ohio-state.edu>
>
>
>
> Hello all,
>
>
>
> We're trying to track down an issue that we can see with MVAPICH 2.3.5,
> but not with OpenMPI.
>
>
>
> What's happening is that sending _many_ small messages with isend or
> issend is causing allocate_vbuf_pool to grow to incredibly large and not be
> released until MPI_Finalize.  My suspicion is that the messages are small
> enough that eager sends are creating temporary buffers that are not being
> freed once the send is completed (seems like that buffer should get freed
> with an MPI_Wait).
>
>
>
> To test this out I wrote a tiny little C++ program that you can find here:
> https://urldefense.com/v3/__https://gist.github.com/friedmud/9533d5997f06414c25f8c5c57a1eaf37__;!!KGKeukY!hQh2EyE07odNeXyxCg5eI0YqkybFFUILpOUM0fp6ikGIydWfAhOn2Ed4PFOjBpaBb63wY4AMRQ$ 
> <https://urldefense.com/v3/__https:/gist.github.com/friedmud/9533d5997f06414c25f8c5c57a1eaf37__;!!KGKeukY!kJAyBUCMODuiN1Hj7XHPleuCMx-_JNRgpAIEmYxmDXRpAiYOHWxuu4-UYqzKFZHmo6JF0CrmNw$>
> (Need a C++11 compliant compiler to compile it)
>
>
>
> The configuration parameters are all at the top - and what it does is send
> an array of doubles to every other process on COMM_WORLD.  Nothing
> earth-shattering.
>
>
>
> You can see the results below when running on 576 procs (and using
> Gperftools instrumentation to check the memory usage for one process).
> What's happening is that for message sizes of less than 2000 doubles (less
> than 128k) allocate_vbuf_pool is using a large amount of memory.  Once the
> message size goes over 2000 doubles then the memory drops back down (in my
> theory: because then the buffer is used directly instead of being copied to
> a temporary buffer for eager sending).
>
>
>
> Note that the memory is being reported just before and after
> MPI_Finalize.  Finalize seems to release all of the memory... so it's not
> being "lost"... it's just not getting freed up once the send is done (and
> maybe not being reused well enough?).
>
>
>
> Any suggestions here?
>
>
>
> Thanks!
>
>
>
> Derek
>
>
>
>
>
>
>
> MPI type
>
> Num procs (sent-received)
>
> Message size
>
> Initial
>
> Before MPI_Finalize (MB)
>
> Final
>
> Top function
>
> MVAPICH
>
> 576 (57500)
>
> 100
>
> 0
>
> 48.4
>
> 0
>
> allocate_vbuf_pool
>
> MVAPICH
>
> 576 (57500)
>
> 1000
>
> 0
>
> 534.1
>
> 0
>
> allocate_vbuf_pool
>
> MVAPICH
>
> 576 (57500)
>
> 10000
>
> 0
>
> 68
>
> 0
>
> MPIU_Handle_indirect_init
>
> MVAPICH
>
> 576 (57500)
>
> 100000
>
> 0
>
> 68.1
>
> 0
>
> MPIU_Handle_indirect_init
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20210617/c3ce2447/attachment-0022.html>


More information about the Mvapich-discuss mailing list