[mvapich-discuss] cannot allocate CQ

Amit H Kumar AHKumar at odu.edu
Wed Jun 25 09:31:38 EDT 2008


Hi Joshua,

Thank you for a detailed explaination... more comments below ..

> I know the Christian already answered your question to show you how to
> get it to work, but I thought it would be valuable to explain WHY adding
> the ulimit setting to the SGE script works.
>
> The difference between running it outside of SGE is that when you run
> the job outside of SGE, the SSHD daemon is actually the daemon forking
> off your executable. The ulimit setting is inherited from SSHD daemon,
> and thus the program is able to execute.
>
> Though when you launch the job from inside of SGE, the sge_execd is
> actually responsible for forking off your executable, thus it must also
> have a proper ulimit setting. This idea applies to most schedules, not
> just SGE, so for TORQUE (or even PBSPro), you'd have to apply the ulimit
> setting to the pbs_mom daemon.
>
That's interesting, I was under the impression that, since SGE uses SSH/RSH
to logon to the compute nodes it should take care of the settings modified
in the /etc/init.d/sshd. But now I believe that's not how it works. Since
sge_execd daemon is the one that is forking the SSH proces, it is this
sge_execd process that has to have the umilimted memlock capability.

Joshua, Is my reasoning correct?

Also have one more question:

Why didn't we have to do this before for MVAPICH2 versions lower than 1.0?
What architerctural changes or interfaces(eg. VAPI etc.) mandates this ?

Thanks!
«Amit»





More information about the mvapich-discuss mailing list