[mvapich-discuss] cannot allocate CQ

Amit H Kumar AHKumar at odu.edu
Tue Jun 24 17:18:15 EDT 2008


Christian that worked ......!!! Thank for your help
«Amit»

Christian Guggenberger <christian.guggenberger at rzg.mpg.de> wrote on
06/24/2008 03:49:42 PM:

> On Tue, Jun 24, 2008 at 01:51:09PM -0400, Amit H Kumar wrote:
> >
> > Hi MVAPICH2-1.0.3,
> >
> > HCA: Mellanox InfiniHost III Lx HCA
> > IB Stack: Qlogic(SilverStorm)
> > Compiled MVAPICH2-1.0.3 using the Verbs API interface.
> >
> > Reading from user guide, I have made changes to
/etc/security/limits.conf
> > file by adding: * soft memlock unlimited
> > And by adding the following line in /etc/init.d/sshd on all compute
nodes,
> > then restarted sshd on all of nodes.
> > ulimit -l unlimited
> >
> > I can run simple hello world and OSU benchmarks: Only when I run
locally on
> > the computed nodes as a regular user/root. But when I run the same
programs
> > as a user SGE job, it fails with error messages attached below:
>
> you'll have to add
> ulimit -l unlimited
>
> in your sge_execd startup script, as well, and restart that daemon.
>
> cheers.
>  - Christian
> >
> > Also appended is the the output of ulimit -a on the compute node...
> >
> > Seems like this has been discussed in the forum previously, but for
some
> > reason I don't understand the difference in running it as an SGE job as
> > opposed to running it locally on the compute node. Could it be the
shell?
> > Can anyone please help me dig into this issue?
> >
> > Thank you,
> > «Amit»
> >
> > <<<<<<<snip>>>>>>>>>>
> > Tracing mpd's ... (this is a check to see mpd's have strated as
expected)
> > zorka-0-8
> > zorka-0-8
> > Now Executing the my program ...
> >
> > ALL TO ALL
> > 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > 1: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > rank 1 in job 1  zorka-0-8.local_35062   caused collective abort of all
> > ranks
> >   exit status of rank 1: return code 1
> > rank 0 in job 1  zorka-0-8.local_35062   caused collective abort of all
> > ranks
> >   exit status of rank 0: return code 1
> >
> > Bcast
> > 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > 1: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > rank 0 in job 2  zorka-0-8.local_35062   caused collective abort of all
> > ranks
> >   exit status of rank 0: return code 1
> >
> > Bi directional BW
> > 1: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > rank 1 in job 3  zorka-0-8.local_35062   caused collective abort of all
> > ranks
> >   exit status of rank 1: return code 1
> > rank 0 in job 3  zorka-0-8.local_35062   caused collective abort of all
> > ranks
> >   exit status of rank 0: return code 1
> >
> > BW
> > 1: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > rank 1 in job 4  zorka-0-8.local_35062   caused collective abort of all
> > ranks
> >   exit status of rank 1: return code 1
> > rank 0 in job 4  zorka-0-8.local_35062   caused collective abort of all
> > ranks
> >   exit status of rank 0: return code 1
> >
> > Latency
> > 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > rank 0 in job 5  zorka-0-8.local_35062   caused collective abort of all
> > ranks
> >   exit status of rank 0: return code 1
> >
> > MBW MR
> > 1: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > 0: [rdma_iba_priv.c:624] error(-253): cannot allocate CQ
> > rank 1 in job 6  zorka-0-8.local_35062   caused collective abort of all
> > ranks
> >   exit status of rank 1: return code 1
> > rank 0 in job 6  zorka-0-8.local_35062   caused collective abort of all
> > ranks
> >   exit status of rank 0: killed by signal 9
> > <<<<<<</snip>>>>>>>>>>
> >
> >
> > [ahkumar at zorka-0-8 ~]$ sh
> > sh-3.1$ ulimit -a
> > core file size          (blocks, -c) 0
> > data seg size           (kbytes, -d) unlimited
> > max nice                        (-e) 0
> > file size               (blocks, -f) unlimited
> > pending signals                 (-i) 71680
> > max locked memory       (kbytes, -l) unlimited
> > max memory size         (kbytes, -m) unlimited
> > open files                      (-n) 1024
> > pipe size            (512 bytes, -p) 8
> > POSIX message queues     (bytes, -q) 819200
> > max rt priority                 (-r) 0
> > stack size              (kbytes, -s) 10240
> > cpu time               (seconds, -t) unlimited
> > max user processes              (-u) 71680
> > virtual memory          (kbytes, -v) unlimited
> > file locks                      (-x) unlimited
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>




More information about the mvapich-discuss mailing list