[Rocks-Discuss] [mvapich-discuss] Re: mvapich locked memory limit and queuing system

Thu Oct 11 14:02:00 EDT 2007

Hi Noam.

The Cisco-OFED Roll for Rocks has the memory locking updates included,  
and the PBS Roll and MOAB Roll also have the changes implemented for  
memory locking (& file descriptor for larger clusters), shown here:

ulimit -n 4096
ulimit -s unlimited
ulimit -l unlimited

Include the parameter(s) you need in your SGE startup script,  
distribute however you choose (extend-compute.xml, script it, copy it)  
and restart the daemon on compute nodes.

I've tested the Cisco-OFED Roll with unlimited locked memory changes  
to the SGE startup script and it worked in the lab.

Let us know how it goes.

Steve

Quoting Jeff Squyres <jsquyres at cisco.com>:

> I'm afraid I don't know much about SGE, but this issue is common to
> several resource managers (and any application that uses the OFED verbs
> stack / requires locked memory -- not just your favorite MPI
> implementation).  The issue is twofold:
>
> 1. Ensure that the resource manager daemons on each cluster node start
> with unlimited locked memory.  Note that changes put in
> /etc/security/limits.conf to set unlimited locked memory are a PAM
> thing; they affect user logins, for example.  Hence, such changes do
> not affect jobs started by init.d scripts.  So you need to manually
> ensure that your daemons are started with unlimited locked memory.
> This typically entails setting the limit *before* su'ing over to start
> the resource manager in the relevant init.d script, but the exact
> details will vary.
>
> 2. As already mentioned, many (most/all?) resource managers rightfully
> want to control the resources that are allocated to a job.  This may
> include locked memory.  If so, ensure that your resource manager is
> setting unlimited locked memory for your jobs.  The details of how to
> do this will obviously be resource manager-specific -- consult their
> docs.
>
> Note that #2 will be ineffectual until #1 has been completed and you
> have restarted all your resource manager daemons across the cluster
> with unlimited locked memory limits.  Specifically: a user process can
> *decrease* the locked memory limit, but it cannot *increase* it above
> the limit from which it was started.  Hence, if your RM daemons start
> with the default 32kb locked memory limit, they are unable to increase
> it above 32kb even if you have configured the RM to run jobs with
> unlimited locked memory.
>
> I usually test my entire cluster by running this trivial script on
> every node via the resource manager:
>
> -----
> #!/bin/sh
>
> foo=`ulimit -l`
> if test "$foo" != "unlimited"; then
>     hostname
> fi
> -----
>
> The only hostnames that you see do not have their memory limits set properly.
>
> Hope this helps.
>
>
>
> On Oct 10, 2007, at 4:22 PM, John Leidel wrote:
>
>> Noam, I've also seen this happen with SGE 6.0 [I suggest also posting
>> your question to the SGE users list].  It has to do with the SGE_EXECD
>> arbitrarily setting its memlimits very low.  As such, all child
>> processes retain the low ceiling.  Check the SGE startup scripts to see
>> if the execd startup section is setting the memlimits.
>>
>> cheers
>> john
>>
>> On Wed, 2007-10-10 at 16:07 -0400, Noam Bernstein wrote:
>>> Has anyone noticed a conflict between OFED/mvapich's desire for a
>>> large locked memory
>>> limit and the queuing system?   I've upped the memlock  limit in /etc/
>>> security/limits.conf
>>> (and rebooted, so the queuing system daemons should be seeing the
>>> larger limit).
>>> This allows me to run interactively.  However, jobs submitted through
>>> our queuing system
>>> (SGE 6.0) still see the low memlock limit (32 kB), and can't increase
>>> the limit.  I found a brief
>>> discussion of a similar problem in torque (this thread:
>>> http://www.beowulf.org/archive/2006-May/015559.html
>>> ), but no apparent solution.
>>>
>>> Has anyone seen this problem, and preferably discovered a solution?
>>>
>>> 											thanks,
>>> 											Noam
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
> -- 
> Jeff Squyres
> Cisco Systems