[mvapich-discuss] MV2_USE_LAZY_MEM_UNREGISTER and memory usage?

Xie Min xmxmxie at gmail.com
Wed Feb 4 07:43:40 EST 2009


We are building a cluster which use Infiniband as the interconnection,
each node has two Intel Xeon E5450 CPU, 4 cores/CPU, and 16GB memory.

We installed mvapich2-1.2 on this cluster, and are using HPCC to do
some tests, but when we enlarge the memory scale of HPCC to some
extent, we often met "Out of memory" error.

For example:
    In 8 nodes, we run 64 tasks HPCC program, so each node has 8 tasks
running in it (one task for each CPU core). We use "top" to view the
memory usage of HPCC tasks, if the memory scale of each tasks is set
to 1.2/1.3GB (list in "RES" column of "top" output), the HPCC tasks
will exit after running for a while (seems running Linpack). Using
"dmesg", we found "Out of memory" error.

    We browsed the user guide of mvapich2, and found
"MV2_USE_LAZY_MEM_UNREGISTER" parameter, this parameter controls if
Pin-Down Cache is used. We set MV2_USE_LAZY_MEM_UNREGISTER to 0, and
do HPCC tests again, now even we set the memory scale of each HPCC
task to 1.6/1.7GB (list in RES column of "top" output), HPCC can run
successfully without being killed by OS.

Because each node in our cluster has 16GB physical memory (2GB for
each CPU core on average), so we are wondering why each HPCC task can
use only 1.2/1.3GB memory when Pin-Down Cache is enabled.

Using OSU benchmarks, we found if Pin-Down Cache is disabled,
osu_latency performance will decrease on long message, so we still
want to use Pin-Down Cache when running HPCC on large memory scale.

BTW, our cluster nodes have no harddisk, they boot using BOOTP and
mount a common directory from a file server using Lustre. So there is
no swap in each node too.

How can we set the correct mvapich2 parameters to run HPCC in large
memory scale (such as each HPCC task can be set to > 1.7GB )?

Thanks!


More information about the mvapich-discuss mailing list