[mvapich-discuss] Re: [Rocks-Discuss] mvapich locked memory limit
and queuing system
Jeff Squyres
jsquyres at cisco.com
Thu Oct 11 08:14:07 EDT 2007
I'm afraid I don't know much about SGE, but this issue is common to
several resource managers (and any application that uses the OFED
verbs stack / requires locked memory -- not just your favorite MPI
implementation). The issue is twofold:
1. Ensure that the resource manager daemons on each cluster node
start with unlimited locked memory. Note that changes put in /etc/
security/limits.conf to set unlimited locked memory are a PAM thing;
they affect user logins, for example. Hence, such changes do not
affect jobs started by init.d scripts. So you need to manually
ensure that your daemons are started with unlimited locked memory.
This typically entails setting the limit *before* su'ing over to
start the resource manager in the relevant init.d script, but the
exact details will vary.
2. As already mentioned, many (most/all?) resource managers
rightfully want to control the resources that are allocated to a
job. This may include locked memory. If so, ensure that your
resource manager is setting unlimited locked memory for your jobs.
The details of how to do this will obviously be resource manager-
specific -- consult their docs.
Note that #2 will be ineffectual until #1 has been completed and you
have restarted all your resource manager daemons across the cluster
with unlimited locked memory limits. Specifically: a user process
can *decrease* the locked memory limit, but it cannot *increase* it
above the limit from which it was started. Hence, if your RM daemons
start with the default 32kb locked memory limit, they are unable to
increase it above 32kb even if you have configured the RM to run jobs
with unlimited locked memory.
I usually test my entire cluster by running this trivial script on
every node via the resource manager:
-----
#!/bin/sh
foo=`ulimit -l`
if test "$foo" != "unlimited"; then
hostname
fi
-----
The only hostnames that you see do not have their memory limits set
properly.
Hope this helps.
On Oct 10, 2007, at 4:22 PM, John Leidel wrote:
> Noam, I've also seen this happen with SGE 6.0 [I suggest also posting
> your question to the SGE users list]. It has to do with the SGE_EXECD
> arbitrarily setting its memlimits very low. As such, all child
> processes retain the low ceiling. Check the SGE startup scripts to
> see
> if the execd startup section is setting the memlimits.
>
> cheers
> john
>
> On Wed, 2007-10-10 at 16:07 -0400, Noam Bernstein wrote:
>> Has anyone noticed a conflict between OFED/mvapich's desire for a
>> large locked memory
>> limit and the queuing system? I've upped the memlock limit in /
>> etc/
>> security/limits.conf
>> (and rebooted, so the queuing system daemons should be seeing the
>> larger limit).
>> This allows me to run interactively. However, jobs submitted through
>> our queuing system
>> (SGE 6.0) still see the low memlock limit (32 kB), and can't increase
>> the limit. I found a brief
>> discussion of a similar problem in torque (this thread:
>> http://www.beowulf.org/archive/2006-May/015559.html
>> ), but no apparent solution.
>>
>> Has anyone seen this problem, and preferably discovered a solution?
>>
>> thanks,
>> Noam
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
--
Jeff Squyres
Cisco Systems
More information about the mvapich-discuss
mailing list