[mvapich-discuss] MV2_USE_LAZY_MEM_UNREGISTER and memory usage?

Xie Min xmxmxie at gmail.com
Sat Mar 14 11:06:31 EDT 2009


Today, we do some other tests.

For 128 tasks HPCC (16 nodes * 8), we can run the whole test
successfully and get final result.

But for 512 tasks HPCC (64 nodes * 8), HPL is freezed too when
MV2_USE_LAZY_MEM_UNREGISTER=1.

I attach an input file for 512 tasks HPCC (about 1.6GB for each task),
maybe you can try it on your systems to see if it will produce the
same problem.

Thanks.

2009/3/12 Matthew Koop <koop at cse.ohio-state.edu>:
> Xie,
>
> Thanks for sending this information along. We've spent some time
> investigating the issue and came up with a patch that will hopefully
> resolve your issue. I've attached it to this email and it should be
> applied at the base directory.
>
> Please let us know if this helps the problem,
>
> Matt
>
> On Mon, 9 Feb 2009, Xie Min wrote:
>
>> The hpcc we used is HPCC 1.0.0, but we just tried HPCC 1.3.1, seems
>> has the same problem.
>>
>> In the attachment we attached two hpccinf.txt files for 64 HPCC tasks,
>> the hpccinf.txt.13 is the "RES" of about 1.3GB, while hpccinf.txt.16
>> is the "RES" of about 1.6/1.7GB. Whould you please try them on your
>> systems (with MV2_USE_LAZY_MEM_UNREGISTER=1), thanks.
>>
>> BTW, the OFED version we used is 1.3.1, physical memory on each node
>> is 16GB, use 8 nodes for 64 tasks.
>>
>>
>>
>> 2009/2/7 Matthew Koop <koop at cse.ohio-state.edu>:
>> >
>> > Thanks for the additional information. I've tried here with HPCC 1.3.1 and
>> > I haven't been able to see any difference in the 'RES' or 'VIRT' memory
>> > while running.
>> >
>> > Would it be possible to send me your hpccinf.txt file so I can more
>> > closely try to reproduce the problem? We also have AS5 with kernel 2.6.18
>> > as well.
>> >
>> > Thanks,
>> >
>> > Matt
>> >
>> > On Thu, 5 Feb 2009, Xie Min wrote:
>> >
>> >> We use Redhat AS5, kernel is 2.6.18 with lustre 1.6.6, and we don't
>> >> modify kernel source.
>> >>
>> >> We test HPCC on two clusters:
>> >> In one cluster, each node is booted using Boot over IB, it has no
>> >> harddisk, so NO swap space. We run 64 HPCC tasks on 8 nodes (so each
>> >> CPU core in the node will run one HPCC task), when each HPCC task use
>> >> 1.2/1.3G memory, it will be killed by OS because of "Out of memory"
>> >> error. But when MV2_USE_LAZY_MEM_UNREGISTER=0, task can use 1.7G
>> >> memory and run successfully.
>> >>
>> >> In another cluster, each node has harddisk, it booted from local disk,
>> >> and it HAS space space. We run 64 HPCC tasks on 8 nodes too. When each
>> >> HPCC use 1.3G memory, we use "top" to show the memory usage
>> >> information, we found swap will be used when HPCC is running for a
>> >> while, and the node begin to run very slowly and cannot respond to
>> >> keyboard input. But when MV2_USE_LAZY_MEM_UNREGISTER=0, each task can
>> >> be set to 1.7G memory scale and run successfully.
>> >>
>> >> I tried another mvapich2 parameters: MV2_USE_LAZY_MEM_UNREGISTER=1,
>> >> and MV2_NDREG_ENTRIES=8. In this configuration, HPCC is still be
>> >> killed by OS with "Out of memory" error when the memory scale of each
>> >> task is set to 1.3GB.
>> >>
>> >> 2009/2/5 Matthew Koop <koop at cse.ohio-state.edu>:
>> >> > Hi,
>> >> >
>> >> > What OS/distro are you running? Are there any changes you made, such as
>> >> > page size, etc from the base?
>> >> >
>> >> > I'm taking a look at this issue on our machine as well, although I'm not
>> >> > seeing the memory change that you reported.
>> >> >
>> >> > Matt
>> >> >
>> >> >
>> >>
>> >
>> >
>>
>
-------------- next part --------------
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
326000         Ns
1            # of NBs
80           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
16            Ps
32            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                      		Number of additional problem sizes for PTRANS
1200 10000 30000        	values of N
0                       	number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64       	values of NB


More information about the mvapich-discuss mailing list