[mvapich-discuss] (no subject)
Troy Telford
ttelford at linuxnetworx.com
Tue Mar 21 17:37:36 EST 2006
On Tue, 21 Mar 2006 11:24:34 -0700, Sayantan Sur <surs at cse.ohio-state.edu>
wrote:
> I have some questions about how you are running these programs:
>
> 1) Are you *simultaneously* running three MPI applications on a cluster,
> with memory usage configured to be 70%, 1% and 75%?
>
> OR
>
> 2) Are you starting the job which uses 70% of memory and while that is
> executing, you start the job with 1% memory and that finishes (but the
> 70% memory job is still running) ... on top of that you are starting the
> job with 75% memory consumption? Even in this mode, the total memory
> requirements imposed on the nodes is 145% of their capacity.
>
> The memory consumed by each process should be freed as soon as the
> process has ended. However, if the process (ie. HPL) which is running,
> still has the memory allocated, then there is not much we can do :-)
When running HPL, hpl.dat can contain multiple problem sizes.
xhpl
* reads the config file, and runs one problem size (until it completes the
problem)
* After the previous problem is finished, hpl will then start execution on
the next problem size.
* When HPL finishes with one problem size, it frees the memory used for
the last loop, then allocates the memory for the next loop. (ie. there is
only one job running the entire time, and the process(es) don't exit until
all problem sizes are solved).
So, if I were to specify multiple problem sizes with HPL, whose memory
usage would amount to 70% of total memory, then 1% of total memory, then
70% of total memory, I would expect the amount of allocated memory to
follow a '70%-1%-70%' pattern (not counting the amount of memory used by
other system processes, which may be 3-4%). This is what happens with the
other MPI implementations I've used.
However, with MVAPICH (and -DLAZY_MEM_UNREGISTER), memory allocation
follow a pattern more like 70%-71%-141%. When the hpl process finishes
one problem size and moves onto the next, no memory is freed -- but it
/is/ allocated. I'd say it's something like a memory leak, because memory
is allocated and never freed until the process exits; but I suspect
'memory leak' is not the correct term. (And it appears that
-DLAZY_MEM_UNREGISTER has something to do with the behavior)
> Can you please tell us if these MPI implementations had the registration
> caching mechanism a.k.a. -DLAZY_MEM_UNREGISTER enabled during these
> runs? IMHO, if you were able to get this config to run on other MPI
> implementations (which required memory registration) to work without
> `caching' of registrations, you should be able to do the exact same
> thing with MVAPICH, by disabling -DLAZY_MEM_UNREGISTER.
This may be true, but I can say that a Google search for
LAZY_MEM_UNREGISTER is pretty sparse, and only turned up 38 entries. All
of them were with reguard to one of three MPI implementations: MVICH,
MVAPICH, and MPICH (using the ch_vapi interface).
In other words, the other LAZY_MEM_UNREGISTER doesn't exist in these other
MPIs.
> Right. As long as you are aware of the performance implications of
> turning registration cache off, it should be fine. There will be no
> other side effects
That I can live with; although I do have one final question: Can
LAZY_MEM_UNREGISTER be tuned at run-time, or only at compile-time.
(ie. can I set MVAPICH to be less... lazy... at unregistering memory with
a command-line option to mpirun?)
--
Troy Telford
More information about the mvapich-discuss
mailing list