[mvapich-discuss] (no subject)

Jim Galarowicz jeg at krellinst.org
Mon Nov 2 10:42:57 EST 2015


Hi Andy,

Thanks for the reply.

I restarted slurm with this command:

$ sudo /etc/init.d/slurm start
[sudo] password for jeg:
starting slurmctld:

$ !sru
srun -n 2 --mpi=pmi2 ulimit.sh
ccn001.cc.nx: 64
ccn001.cc.nx: 64

$  cat ulimit.sh
#!/bin/sh
     echo $(hostname): $(ulimit -l)


It looks like I'm still not getting ulimited on the compute nodes, but 
when I do the salloc and do ulimit -l, I see unlimited.

[jeg at hdn nbody]$ ulimit -l
unlimited


[jeg at hdn nbody]$ cat   /etc/sysconfig/slurm
ulimit -l unlimited

Do you see anything wrong in what I'm doing?

Thanks again for the reply!

Jim G

On 11/01/2015 02:41 PM, Andy Riebs wrote:
> Jim,
>
> Did you restart Slurm on the compute nodes after setting up 
> /etc/sysconfig/slurm?
>
> Also, in your local job, what does "ulimit -l" show? That will get 
> propagated to the computes.
>
> Andy
>
> On 11/01/2015 05:02 PM, Jim Galarowicz wrote:
>> X-MS-Exchange-CrossTenant
>> --===============6581539869262316634==
>> Content-Type: multipart/alternative;
>>     boundary="------------020002060905080401040409"
>>
>> --------------020002060905080401040409
>> Content-Type: text/plain; charset="utf-8"; format=flowed
>> Content-Transfer-Encoding: 7bit
>>
>> Hi everyone,
>>
>> I'm running on a small cluster that has slurm and mvapich2 version 2.1
>> installed.
>> However, I'm seeing this error when I try to run a simple mpi 
>> application.
>>
>>      /srun -n 2 --mpi=pmi2 ./nbody-mvapich2//
>>      / /
>>      //In: PMI_Abort(1, Fatal error in MPI_Init://
>>      //Other MPI error, error stack://
>>      //MPIR_Init_thread(514).......: //
>>      //MPID_Init(367)..............: channel initialization failed//
>>      //MPIDI_CH3_Init(492).........: //
>>      //MPIDI_CH3I_RDMA_init(224)...: //
>>      //rdma_setup_startup_ring(410): cannot create cq//
>>      //)//
>>      //In: PMI_Abort(1, Fatal error in MPI_Init://
>>      //Other MPI error, error stack://
>>      //MPIR_Init_thread(514).......: //
>>      //MPID_Init(367)..............: channel initialization failed//
>>      //MPIDI_CH3_Init(492).........: //
>>      //MPIDI_CH3I_RDMA_init(224)...: //
>>      //rdma_setup_startup_ring(410): cannot create cq//
>>      //)//
>>      /
>>
>>
>>
>> I searched the internet and found this url
>> (http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html) 
>>
>> on the "cannot create cq" issue, which suggests we need to set
>>
>> ulimit -l unlimited  in  /etc/sysconfig/slurm
>>
>>> If it doesn't show unlimited (or some other number much higher than 64)
>>> then you'll need to do something to update the limits slurm is using.
>>> On redhat systems you can put the following in /etc/sysconfig/slurm.
>>>
>>>       ulimit -l unlimited
>> So, I added that file with the "ulimit -l unlimited" statement added.
>> But, it didn't seem to make any difference on the issue.
>>
>> Does anyone have any hints on what might be wrong?
>>
>> Thank you,
>> Jim G
>>
>>
>>
>>
>>
>> --------------020002060905080401040409
>> Content-Type: text/html; charset="utf-8"
>> Content-Transfer-Encoding: 7bit
>>
>> <html>
>>    <head>
>>
>>      <meta http-equiv="content-type" content="text/html; charset=utf-8">
>>    </head>
>>    <body bgcolor="#FFFFFF" text="#000000">
>>      Hi everyone,<br>
>>      <br>
>>      I'm running on a small cluster that has slurm and mvapich2 version
>>      2.1 installed.<br>
>>      However, I'm seeing this error when I try to run a simple mpi
>>      application.<br>
>>      <blockquote><i>srun -n 2 --mpi=pmi2 ./nbody-mvapich2</i><i><br>
>>        </i>
>>        <i><br>
>>        </i><i>
>>          In: PMI_Abort(1, Fatal error in MPI_Init:</i><i><br>
>>        </i><i>
>>          Other MPI error, error stack:</i><i><br>
>>        </i><i>
>>          MPIR_Init_thread(514).......: </i><i><br>
>>        </i><i>
>>          MPID_Init(367)..............: channel initialization 
>> failed</i><i><br>
>>        </i><i>
>>          MPIDI_CH3_Init(492).........: </i><i><br>
>>        </i><i>
>>          MPIDI_CH3I_RDMA_init(224)...: </i><i><br>
>>        </i><i>
>>          rdma_setup_startup_ring(410): cannot create cq</i><i><br>
>>        </i><i>
>>          )</i><i><br>
>>        </i><i>
>>          In: PMI_Abort(1, Fatal error in MPI_Init:</i><i><br>
>>        </i><i>
>>          Other MPI error, error stack:</i><i><br>
>>        </i><i>
>>          MPIR_Init_thread(514).......: </i><i><br>
>>        </i><i>
>>          MPID_Init(367)..............: channel initialization 
>> failed</i><i><br>
>>        </i><i>
>>          MPIDI_CH3_Init(492).........: </i><i><br>
>>        </i><i>
>>          MPIDI_CH3I_RDMA_init(224)...: </i><i><br>
>>        </i><i>
>>          rdma_setup_startup_ring(410): cannot create cq</i><i><br>
>>        </i><i>
>>          )</i><i><br>
>>        </i></blockquote>
>>      <br>
>>      <br>
>>      I searched the internet and found this url (<a
>>        moz-do-not-send="true"
>> href="http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html" 
>>
>> target="_blank">http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html</a>)
>>      on the "cannot create cq" issue, which suggests we need to set <br>
>>      <pre>ulimit -l unlimited  in /etc/sysconfig/slurm</pre>
>>      <blockquote type="cite">
>>        <pre>If it doesn't show unlimited (or some other number much 
>> higher than 64)
>> then you'll need to do something to update the limits slurm is using.
>> On redhat systems you can put the following in /etc/sysconfig/slurm.
>>
>>      ulimit -l unlimited
>> </pre>
>>      </blockquote>
>>      So, I added that file with the "ulimit -l unlimited" statement
>>      added.<br>
>>      But, it didn't seem to make any difference on the issue.<br>
>>      <br>
>>      Does anyone have any hints on what might be wrong?<br>
>>      <br>
>>      Thank you,<br>
>>      Jim G<br>
>>      <br>
>>      <br>
>>      <br>
>>      <br>
>>    </body>
>> </html>
>>
>> --------------020002060905080401040409--
>>
>> --===============6581539869262316634==
>> Content-Type: text/plain; charset="us-ascii"
>> MIME-Version: 1.0
>> Content-Transfer-Encoding: 7bit
>> Content-Disposition: inline
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>> --===============6581539869262316634==--
>



More information about the mvapich-discuss mailing list