[mvapich-discuss] cannot allocate CQ
Christian Guggenberger
christian.guggenberger at rzg.mpg.de
Tue Jun 24 15:49:42 EDT 2008
On Tue, Jun 24, 2008 at 01:51:09PM -0400, Amit H Kumar wrote:
>
> Hi MVAPICH2-1.0.3,
>
> HCA: Mellanox InfiniHost III Lx HCA
> IB Stack: Qlogic(SilverStorm)
> Compiled MVAPICH2-1.0.3 using the Verbs API interface.
>
> Reading from user guide, I have made changes to /etc/security/limits.conf
> file by adding: * soft memlock unlimited
> And by adding the following line in /etc/init.d/sshd on all compute nodes,
> then restarted sshd on all of nodes.
> ulimit -l unlimited
>
> I can run simple hello world and OSU benchmarks: Only when I run locally on
> the computed nodes as a regular user/root. But when I run the same programs
> as a user SGE job, it fails with error messages attached below:
you'll have to add
ulimit -l unlimited
in your sge_execd startup script, as well, and restart that daemon.
cheers.
- Christian
>
> Also appended is the the output of ulimit -a on the compute node...
>
> Seems like this has been discussed in the forum previously, but for some
> reason I don't understand the difference in running it as an SGE job as
> opposed to running it locally on the compute node. Could it be the shell?
> Can anyone please help me dig into this issue?
>
> Thank you,
> «Amit»
>
> <<<<<<<snip>>>>>>>>>>
> Tracing mpd's ... (this is a check to see mpd's have strated as expected)
> zorka-0-8
> zorka-0-8
> Now Executing the my program ...
>
> ALL TO ALL
> 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> 1: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> rank 1 in job 1 zorka-0-8.local_35062 caused collective abort of all
> ranks
> exit status of rank 1: return code 1
> rank 0 in job 1 zorka-0-8.local_35062 caused collective abort of all
> ranks
> exit status of rank 0: return code 1
>
> Bcast
> 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> 1: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> rank 0 in job 2 zorka-0-8.local_35062 caused collective abort of all
> ranks
> exit status of rank 0: return code 1
>
> Bi directional BW
> 1: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> rank 1 in job 3 zorka-0-8.local_35062 caused collective abort of all
> ranks
> exit status of rank 1: return code 1
> rank 0 in job 3 zorka-0-8.local_35062 caused collective abort of all
> ranks
> exit status of rank 0: return code 1
>
> BW
> 1: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> rank 1 in job 4 zorka-0-8.local_35062 caused collective abort of all
> ranks
> exit status of rank 1: return code 1
> rank 0 in job 4 zorka-0-8.local_35062 caused collective abort of all
> ranks
> exit status of rank 0: return code 1
>
> Latency
> 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> rank 0 in job 5 zorka-0-8.local_35062 caused collective abort of all
> ranks
> exit status of rank 0: return code 1
>
> MBW MR
> 1: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> rank 1 in job 6 zorka-0-8.local_35062 caused collective abort of all
> ranks
> exit status of rank 1: return code 1
> rank 0 in job 6 zorka-0-8.local_35062 caused collective abort of all
> ranks
> exit status of rank 0: killed by signal 9
> <<<<<<</snip>>>>>>>>>>
>
>
> [ahkumar at zorka-0-8 ~]$ sh
> sh-3.1$ ulimit -a
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> max nice (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 71680
> max locked memory (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> max rt priority (-r) 0
> stack size (kbytes, -s) 10240
> cpu time (seconds, -t) unlimited
> max user processes (-u) 71680
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list