[mvapich-discuss] waitsome/testsome memory allocation

Fri Oct 12 12:24:23 EDT 2007

1MB is the minimum size allocated. Depending on the application behavior
and configuration library, much more can be allocated. In the case of the
application not pre-posting receives and sending quite a few messages this
sort of behavior is largely unavoidable. I've done some further testing
and not found any evidence of leaking.

If this problem persists and memory usage is a higher concern than
performance (and you do not want to optimize your MPI application), you
can change the message size threshold at which MVAPICH performs the
'rendezvous' protocol -- which makes sure data is not sent until the
cooresponding receive has been posted.

The following ENV var will change the setting:
VIADEV_RENDEZVOUS_THRESHOLD=<msg size>

Again, the real solution is to pre-post the receives. Let us know if there
is anything else you notice or if the memory usage has decreased after
pre-posting receives.

Thanks,

Matt

On Wed, 10 Oct 2007, Justin wrote:

> The number of processes ranges from 16-2048 processors.  I don't have
> information readily available as to how many sends we have or how large
> the messages are i'll have to look into that before I can quote hard
> numbers.   Each processor has to communicated each iteration multiple
> arrays of variables multiple times.  When possible the messages are
> packed into a single message but i'm sure there are many messages every
> iteration.  It is possible we are reaching a highwater on each process
> but not all of them have reached it.  I'll have to look more closely at
> the allocation on long runs.  The allocation does not occur in 1MB
> jumps.  I've seen the allocation as high as 18 Megs.  I'll try to get
> some more information to you soon.
>
> Justin
>
> Matthew Koop wrote:
> > How many processes are there in your execution and how many sends are
> > issued to any given process? If receives are not preposted the MPI library
> > may have to buffer them on the receive side until the receives are posted
> > (assuming your messages are < 8K).
> >
> > Memory will need to be allocated to hold them and is not freed -- under
> > the assumption that it will be needed again (these buffers are reused). As
> > you note, I would expect there to be a high water level there with your
> > application pattern though. The jumps you see in memory allocation is
> > likely the blocks that are allocated for receive buffers. Are the jumps
> > ~1MB in size (per process)?
> >
> > We'll also take a closer look on our side given this information. I
> > haven't been able to replicate the behavior here with a simple test of
> > repeated unexpected messages.
> >
> > Matt
> >
> > On Wed, 10 Oct 2007, Justin wrote:
> >
> >
> >> Hi,  after looking through the MPI source code I have come to the
> >> conclusion that these allocations are occurring because we post the
> >> receives later than the sends.  Many of the sends complete prior to the
> >> posting of their corresponding receives.  Thus handles must be allocated
> >> in order to receive them prior to them being placed in the unexpected
> >> method queue.  Is it possible that the handles are not being deleted
> >> after the requests are being posted?  Does your implementation hold on
> >> to memory after it is allocated under the assumption that it will be
> >> used again in the future?
> >>
> >> I'm concerned that there might be a leak in your implementation with
> >> this communication pattern.  If you do retain memory under the
> >> assumption that it will be used again eventually the program should
> >> reach a high water.  Unfortunately this does not seem to be the case
> >> because our usage keeps going up slowly.  The average is going up
> >> because occasionally a processor will allocate a large amount of memory.
> >>
> >> We are going to try and post the receives up front but would like to
> >> verify that there isn't a leak also.
> >>
> >> Justin
> >>
> >> Justin wrote:
> >>
> >>> Hi,
> >>>
> >>> The relevant stack traces on these allocations is the following:
> >>>
> >>>
> >>> 1. /g/g20/luitjens/SCIRunMemory/dbg/lib/libCore_Thread.so [0x2a95c42284]
> >>> 2. /lib64/tls/libc.so.6 [0x2a9a7152b0]
> >>> 3. /lib64/tls/libc.so.6(gsignal+0x3d) [0x2a9a71521d]
> >>> 4. /lib64/tls/libc.so.6(abort+0xfe) [0x2a9a716a1e]
> >>> 5.
> >>> /usr/local/tools/gnu/gcc/3.4.4_RH_chaos_3_x86_64/usr/lib64/libstdc++.so.6
> >>> in __gnu_cxx::__verbose_terminate_handler()
> >>> 6.
> >>> /usr/local/tools/gnu/gcc/3.4.4_RH_chaos_3_x86_64/usr/lib64/libstdc++.so.6
> >>> [0x2a9a499076]
> >>> 7.
> >>> /usr/local/tools/gnu/gcc/3.4.4_RH_chaos_3_x86_64/usr/lib64/libstdc++.so.6
> >>> [0x2a9a4990a3]
> >>> 8.
> >>> /usr/local/tools/gnu/gcc/3.4.4_RH_chaos_3_x86_64/usr/lib64/libstdc++.so.6
> >>> [0x2a9a4990b6]
> >>> 9.
> >>> /usr/local/tools/gnu/gcc/3.4.4_RH_chaos_3_x86_64/usr/lib64/libstdc++.so.6(__cxa_call_unexpected+0x48)
> >>> [0x2a9a498fc8]
> >>> a. /g/g20/luitjens/SCIRunMemory/dbg/lib/libCore_Malloc.so(malloc+0x63)
> >>> [0x2a980c92ff]
> >>> b. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPID_SBiAllocate+0x39)
> >>> [0x2a98bdce39]
> >>> c. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPID_SBalloc+0x2b)
> >>> [0x2a98bdcf8b]
> >>> d. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPID_Msg_arrived+0xe3)
> >>> [0x2a98bda2b3]
> >>> e.
> >>> /g/g20/luitjens/mpi//lib/libmpich.so.1.0(viadev_incoming_eager_start+0x43)
> >>> [0x2a98be8753]
> >>> f. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(viadev_process_recv+0x2ef)
> >>> [0x2a98be9b6f]
> >>> 10. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPID_DeviceCheck+0xde)
> >>> [0x2a98bea77e]
> >>> 11. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPI_Testsome+0x45)
> >>> [0x2a98be1b35]
> >>>
> >>> And
> >>>
> >>> 1. /g/g20/luitjens/SCIRunMemory/dbg/lib/libCore_Thread.so [0x2a95c42284]
> >>> 2. /lib64/tls/libc.so.6 [0x2a9a7152b0]
> >>> 3. /lib64/tls/libc.so.6(gsignal+0x3d) [0x2a9a71521d]
> >>> 4. /lib64/tls/libc.so.6(abort+0xfe) [0x2a9a716a1e]
> >>> 5.
> >>> /usr/local/tools/gnu/gcc/3.4.4_RH_chaos_3_x86_64/usr/lib64/libstdc++.so.6
> >>>  in __gnu_cxx::__verbose_terminate_handler()
> >>> 6.
> >>> /usr/local/tools/gnu/gcc/3.4.4_RH_chaos_3_x86_64/usr/lib64/libstdc++.so.6
> >>> [0x2a9a499076]
> >>> 7.
> >>> /usr/local/tools/gnu/gcc/3.4.4_RH_chaos_3_x86_64/usr/lib64/libstdc++.so.6
> >>> [0x2a9a4990a3]
> >>> 8.
> >>> /usr/local/tools/gnu/gcc/3.4.4_RH_chaos_3_x86_64/usr/lib64/libstdc++.so.6
> >>> [0x2a9a4990b6]
> >>> 9.
> >>> /usr/local/tools/gnu/gcc/3.4.4_RH_chaos_3_x86_64/usr/lib64/libstdc++.so.6(__cxa_call_unexpected+0x48)
> >>> [0x2a9a498fc8]
> >>> a. /g/g20/luitjens/SCIRunMemory/dbg/lib/libCore_Malloc.so(malloc+0x63)
> >>> [0x2a980c92ff]
> >>> b. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPID_SBiAllocate+0x39)
> >>> [0x2a98bdce39]
> >>> c. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPID_SBalloc+0x2b)
> >>> [0x2a98bdcf8b]
> >>> d. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPID_Msg_arrived+0xe3)
> >>> [0x2a98bda2b3]
> >>> e. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(smpi_net_lookup+0xc24)
> >>> [0x2a98bd3bd4]
> >>> f.
> >>> /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPID_SMP_Check_incoming+0x2d5)
> >>> [0x2a98bd4ee5]
> >>> 10. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPID_DeviceCheck+0x185)
> >>> [0x2a98bea825]
> >>> 11. /g/g20/luitjens/mpi//lib/libmpich.so.1.0(MPI_Testsome+0x45)
> >>> [0x2a98be1b35]
> >>>
> >>>
> >>> I have tried turning of the ELAN optimizations and the allocations
> >>> still occur.  The commonality in the stack traces appears to be calls
> >>> to SBalloc.  Is it possible that there is a leak in the MPI library
> >>> that we are running into?  When is the memory allocated in this
> >>> function freed?  If the same communication pattern is occurring over
> >>> and over what would cause this function to keep allocating memory
> >>> instead of reusing the memory that has already been allocated?
> >>>
> >>> Thanks
> >>> Justin
> >>>
> >>>
> >>> Justin wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I am tracking down some memory issues in our code.  And I am finding
> >>>> strange memory allocations occurring within MPI_Waitsome and
> >>>> MPI_Testsome.  In one section of our code we use MPI_Pack and
> >>>> MPI_Unpack to combine a bunch of small messages.  We then send out
> >>>> the packed messages using isend.  The receiving processors post
> >>>> irecvs.  To complete the communication we use both testsome and
> >>>> waitsome.  What we are seeing is processors start by allocating a
> >>>> small amount of memory but as the code marches forward in time
> >>>> processors will allocate more memory within one of these mpi calls.
> >>>> Processors continue allocating larger and larger amounts of memory as
> >>>> time goes on.  For example early on the allocation might be  a couple
> >>>> KB but eventually it will get to around 1MB and i've even seen it as
> >>>> high as 14MB.  I predict that if I ran it further it would allocate a
> >>>> much larger amount that 14MB.  Processors are not all allocating this
> >>>> memory at the same time.   In other parts of the code we do not use
> >>>> packing and we do not see this allocation behavior.  I'm guessing
> >>>> that somewhere we are either miss-using packing or some other MPI
> >>>> feature and are causing MPI to leak.
> >>>>
> >>>> I was wondering if you could tell me why testsome/waitsome would
> >>>> allocate memory as that could provide a good hint as to how we are
> >>>> miss-using mpi.
> >>>>
> >>>> Currently we are using mvapich version 0.9.9  on Atlas at LLNL
> >>>>
> >>>> Thanks,
> >>>> Justin
> >>>> _______________________________________________
> >>>> mvapich-discuss mailing list
> >>>> mvapich-discuss at cse.ohio-state.edu
> >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>>
> >>> _______________________________________________
> >>> mvapich-discuss mailing list
> >>> mvapich-discuss at cse.ohio-state.edu
> >>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
> >>
>