[mvapich-discuss] cannot allocate CQ
Joshua Bernstein
jbernstein at penguincomputing.com
Tue Jun 24 20:21:21 EDT 2008
Amit,
Amit H Kumar wrote:
> reason I don't understand the difference in running it as an SGE job as
> opposed to running it locally on the compute node. Could it be the shell?
> Can anyone please help me dig into this issue?
I know the Christian already answered your question to show you how to
get it to work, but I thought it would be valuable to explain WHY adding
the ulimit setting to the SGE script works.
The difference between running it outside of SGE is that when you run
the job outside of SGE, the SSHD daemon is actually the daemon forking
off your executable. The ulimit setting is inherited from SSHD daemon,
and thus the program is able to execute.
Though when you launch the job from inside of SGE, the sge_execd is
actually responsible for forking off your executable, thus it must also
have a proper ulimit setting. This idea applies to most schedules, not
just SGE, so for TORQUE (or even PBSPro), you'd have to apply the ulimit
setting to the pbs_mom daemon.
-Joshua Bernstein
Software Engineer
Penguin Computing
More information about the mvapich-discuss
mailing list