[mvapich-discuss] cannot allocate CQ

Joshua Bernstein jbernstein at penguincomputing.com
Tue Jun 24 20:21:21 EDT 2008


Amit,

Amit H Kumar wrote:
> reason I don't understand the difference in running it as an SGE job as
> opposed to running it locally on the compute node. Could it be the shell?
> Can anyone please help me dig into this issue?

I know the Christian already answered your question to show you how to 
get it to work, but I thought it would be valuable to explain WHY adding 
the ulimit setting to the SGE script works.

The difference between running it outside of SGE is that when you run 
the job outside of SGE, the SSHD daemon is actually the daemon forking 
off your executable. The ulimit setting is inherited from SSHD daemon, 
and thus the program is able to execute.

Though when you launch the job from inside of SGE, the sge_execd is 
actually responsible for forking off your executable, thus it must also 
have a proper ulimit setting. This idea applies to most schedules, not 
just SGE, so for TORQUE (or even PBSPro), you'd have to apply the ulimit 
setting to the pbs_mom daemon.

-Joshua Bernstein
Software Engineer
Penguin Computing


More information about the mvapich-discuss mailing list