[mvapich-discuss] (no subject)
Jim Galarowicz
jeg at krellinst.org
Mon Nov 2 10:42:57 EST 2015
Hi Andy,
Thanks for the reply.
I restarted slurm with this command:
$ sudo /etc/init.d/slurm start
[sudo] password for jeg:
starting slurmctld:
$ !sru
srun -n 2 --mpi=pmi2 ulimit.sh
ccn001.cc.nx: 64
ccn001.cc.nx: 64
$ cat ulimit.sh
#!/bin/sh
echo $(hostname): $(ulimit -l)
It looks like I'm still not getting ulimited on the compute nodes, but
when I do the salloc and do ulimit -l, I see unlimited.
[jeg at hdn nbody]$ ulimit -l
unlimited
[jeg at hdn nbody]$ cat /etc/sysconfig/slurm
ulimit -l unlimited
Do you see anything wrong in what I'm doing?
Thanks again for the reply!
Jim G
On 11/01/2015 02:41 PM, Andy Riebs wrote:
> Jim,
>
> Did you restart Slurm on the compute nodes after setting up
> /etc/sysconfig/slurm?
>
> Also, in your local job, what does "ulimit -l" show? That will get
> propagated to the computes.
>
> Andy
>
> On 11/01/2015 05:02 PM, Jim Galarowicz wrote:
>> X-MS-Exchange-CrossTenant
>> --===============6581539869262316634==
>> Content-Type: multipart/alternative;
>> boundary="------------020002060905080401040409"
>>
>> --------------020002060905080401040409
>> Content-Type: text/plain; charset="utf-8"; format=flowed
>> Content-Transfer-Encoding: 7bit
>>
>> Hi everyone,
>>
>> I'm running on a small cluster that has slurm and mvapich2 version 2.1
>> installed.
>> However, I'm seeing this error when I try to run a simple mpi
>> application.
>>
>> /srun -n 2 --mpi=pmi2 ./nbody-mvapich2//
>> / /
>> //In: PMI_Abort(1, Fatal error in MPI_Init://
>> //Other MPI error, error stack://
>> //MPIR_Init_thread(514).......: //
>> //MPID_Init(367)..............: channel initialization failed//
>> //MPIDI_CH3_Init(492).........: //
>> //MPIDI_CH3I_RDMA_init(224)...: //
>> //rdma_setup_startup_ring(410): cannot create cq//
>> //)//
>> //In: PMI_Abort(1, Fatal error in MPI_Init://
>> //Other MPI error, error stack://
>> //MPIR_Init_thread(514).......: //
>> //MPID_Init(367)..............: channel initialization failed//
>> //MPIDI_CH3_Init(492).........: //
>> //MPIDI_CH3I_RDMA_init(224)...: //
>> //rdma_setup_startup_ring(410): cannot create cq//
>> //)//
>> /
>>
>>
>>
>> I searched the internet and found this url
>> (http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html)
>>
>> on the "cannot create cq" issue, which suggests we need to set
>>
>> ulimit -l unlimited in /etc/sysconfig/slurm
>>
>>> If it doesn't show unlimited (or some other number much higher than 64)
>>> then you'll need to do something to update the limits slurm is using.
>>> On redhat systems you can put the following in /etc/sysconfig/slurm.
>>>
>>> ulimit -l unlimited
>> So, I added that file with the "ulimit -l unlimited" statement added.
>> But, it didn't seem to make any difference on the issue.
>>
>> Does anyone have any hints on what might be wrong?
>>
>> Thank you,
>> Jim G
>>
>>
>>
>>
>>
>> --------------020002060905080401040409
>> Content-Type: text/html; charset="utf-8"
>> Content-Transfer-Encoding: 7bit
>>
>> <html>
>> <head>
>>
>> <meta http-equiv="content-type" content="text/html; charset=utf-8">
>> </head>
>> <body bgcolor="#FFFFFF" text="#000000">
>> Hi everyone,<br>
>> <br>
>> I'm running on a small cluster that has slurm and mvapich2 version
>> 2.1 installed.<br>
>> However, I'm seeing this error when I try to run a simple mpi
>> application.<br>
>> <blockquote><i>srun -n 2 --mpi=pmi2 ./nbody-mvapich2</i><i><br>
>> </i>
>> <i><br>
>> </i><i>
>> In: PMI_Abort(1, Fatal error in MPI_Init:</i><i><br>
>> </i><i>
>> Other MPI error, error stack:</i><i><br>
>> </i><i>
>> MPIR_Init_thread(514).......: </i><i><br>
>> </i><i>
>> MPID_Init(367)..............: channel initialization
>> failed</i><i><br>
>> </i><i>
>> MPIDI_CH3_Init(492).........: </i><i><br>
>> </i><i>
>> MPIDI_CH3I_RDMA_init(224)...: </i><i><br>
>> </i><i>
>> rdma_setup_startup_ring(410): cannot create cq</i><i><br>
>> </i><i>
>> )</i><i><br>
>> </i><i>
>> In: PMI_Abort(1, Fatal error in MPI_Init:</i><i><br>
>> </i><i>
>> Other MPI error, error stack:</i><i><br>
>> </i><i>
>> MPIR_Init_thread(514).......: </i><i><br>
>> </i><i>
>> MPID_Init(367)..............: channel initialization
>> failed</i><i><br>
>> </i><i>
>> MPIDI_CH3_Init(492).........: </i><i><br>
>> </i><i>
>> MPIDI_CH3I_RDMA_init(224)...: </i><i><br>
>> </i><i>
>> rdma_setup_startup_ring(410): cannot create cq</i><i><br>
>> </i><i>
>> )</i><i><br>
>> </i></blockquote>
>> <br>
>> <br>
>> I searched the internet and found this url (<a
>> moz-do-not-send="true"
>> href="http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html"
>>
>> target="_blank">http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html</a>)
>> on the "cannot create cq" issue, which suggests we need to set <br>
>> <pre>ulimit -l unlimited in /etc/sysconfig/slurm</pre>
>> <blockquote type="cite">
>> <pre>If it doesn't show unlimited (or some other number much
>> higher than 64)
>> then you'll need to do something to update the limits slurm is using.
>> On redhat systems you can put the following in /etc/sysconfig/slurm.
>>
>> ulimit -l unlimited
>> </pre>
>> </blockquote>
>> So, I added that file with the "ulimit -l unlimited" statement
>> added.<br>
>> But, it didn't seem to make any difference on the issue.<br>
>> <br>
>> Does anyone have any hints on what might be wrong?<br>
>> <br>
>> Thank you,<br>
>> Jim G<br>
>> <br>
>> <br>
>> <br>
>> <br>
>> </body>
>> </html>
>>
>> --------------020002060905080401040409--
>>
>> --===============6581539869262316634==
>> Content-Type: text/plain; charset="us-ascii"
>> MIME-Version: 1.0
>> Content-Transfer-Encoding: 7bit
>> Content-Disposition: inline
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>> --===============6581539869262316634==--
>
More information about the mvapich-discuss
mailing list