[mvapich-discuss] (no subject)

Tue Mar 21 17:37:36 EST 2006

On Tue, 21 Mar 2006 11:24:34 -0700, Sayantan Sur <surs at cse.ohio-state.edu>  
wrote:

> I have some questions about how you are running these programs:
>
> 1) Are you *simultaneously* running three MPI applications on a cluster,
> with memory usage configured to be 70%, 1% and 75%?
>
> OR
>
> 2) Are you starting the job which uses 70% of memory and while that is
> executing, you start the job with 1% memory and that finishes (but the
> 70% memory job is still running) ... on top of that you are starting the
> job with 75% memory consumption? Even in this mode, the total memory
> requirements imposed on the nodes is 145% of their capacity.
>
> The memory consumed by each process should be freed as soon as the
> process has ended. However, if the process (ie. HPL) which is running,
> still has the memory allocated, then there is not much we can do :-)

When running HPL, hpl.dat can contain multiple problem sizes.
xhpl
* reads the config file, and runs one problem size (until it completes the  
problem)
* After the previous problem is finished, hpl will then start execution on  
the next problem size.
* When HPL finishes with one problem size, it frees the memory used for  
the last loop, then allocates the memory for the next loop.  (ie. there is  
only one job running the entire time, and the process(es) don't exit until  
all problem sizes are solved).

So, if I were to specify multiple problem sizes with HPL, whose memory  
usage would amount to 70% of total memory, then 1% of total memory, then  
70% of total memory, I would expect the amount of allocated memory to  
follow a '70%-1%-70%' pattern (not counting the amount of memory used by  
other system processes, which may be 3-4%).  This is what happens with the  
other MPI implementations I've used.

However, with MVAPICH (and -DLAZY_MEM_UNREGISTER), memory allocation  
follow a pattern more like 70%-71%-141%.  When the hpl process finishes  
one problem size and moves onto the next, no memory is freed -- but it  
/is/ allocated.  I'd say it's something like a memory leak, because memory  
is allocated and never freed until the process exits; but I suspect  
'memory leak' is not the correct term.  (And it appears that  
-DLAZY_MEM_UNREGISTER has something to do with the behavior)

> Can you please tell us if these MPI implementations had the registration
> caching mechanism a.k.a. -DLAZY_MEM_UNREGISTER enabled during these
> runs? IMHO, if you were able to get this config to run on other MPI
> implementations (which required memory registration) to work without
> `caching' of registrations, you should be able to do the exact same
> thing with MVAPICH, by disabling -DLAZY_MEM_UNREGISTER.

This may be true, but I can say that a Google search for  
LAZY_MEM_UNREGISTER is pretty sparse, and only turned up 38 entries.  All  
of them were with reguard to one of three MPI implementations:  MVICH,  
MVAPICH, and MPICH (using the ch_vapi interface).

In other words, the other LAZY_MEM_UNREGISTER doesn't exist in these other  
MPIs.

> Right. As long as you are aware of the performance implications of
> turning registration cache off, it should be fine. There will be no
> other side effects

That I can live with; although I do have one final question:  Can  
LAZY_MEM_UNREGISTER be tuned at run-time, or only at compile-time.
(ie. can I set MVAPICH to be less... lazy... at unregistering memory with  
a command-line option to mpirun?)
-- 
Troy Telford