[mvapich-discuss] (no subject)

Andy Riebs andy.riebs at hpe.com
Sun Nov 1 17:41:33 EST 2015


Jim,

Did you restart Slurm on the compute nodes after setting up 
/etc/sysconfig/slurm?

Also, in your local job, what does "ulimit -l" show? That will get 
propagated to the computes.

Andy

On 11/01/2015 05:02 PM, Jim Galarowicz wrote:
> X-MS-Exchange-CrossTenant
> --===============6581539869262316634==
> Content-Type: multipart/alternative;
> 	boundary="------------020002060905080401040409"
>
> --------------020002060905080401040409
> Content-Type: text/plain; charset="utf-8"; format=flowed
> Content-Transfer-Encoding: 7bit
>
> Hi everyone,
>
> I'm running on a small cluster that has slurm and mvapich2 version 2.1
> installed.
> However, I'm seeing this error when I try to run a simple mpi application.
>
>      /srun -n 2 --mpi=pmi2 ./nbody-mvapich2//
>      / /
>      //In: PMI_Abort(1, Fatal error in MPI_Init://
>      //Other MPI error, error stack://
>      //MPIR_Init_thread(514).......: //
>      //MPID_Init(367)..............: channel initialization failed//
>      //MPIDI_CH3_Init(492).........: //
>      //MPIDI_CH3I_RDMA_init(224)...: //
>      //rdma_setup_startup_ring(410): cannot create cq//
>      //)//
>      //In: PMI_Abort(1, Fatal error in MPI_Init://
>      //Other MPI error, error stack://
>      //MPIR_Init_thread(514).......: //
>      //MPID_Init(367)..............: channel initialization failed//
>      //MPIDI_CH3_Init(492).........: //
>      //MPIDI_CH3I_RDMA_init(224)...: //
>      //rdma_setup_startup_ring(410): cannot create cq//
>      //)//
>      /
>
>
>
> I searched the internet and found this url
> (http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html)
> on the "cannot create cq" issue, which suggests we need to set
>
> ulimit -l unlimited  in  /etc/sysconfig/slurm
>
>> If it doesn't show unlimited (or some other number much higher than 64)
>> then you'll need to do something to update the limits slurm is using.
>> On redhat systems you can put the following in /etc/sysconfig/slurm.
>>
>>       ulimit -l unlimited
> So, I added that file with the "ulimit -l unlimited" statement added.
> But, it didn't seem to make any difference on the issue.
>
> Does anyone have any hints on what might be wrong?
>
> Thank you,
> Jim G
>
>
>
>
>
> --------------020002060905080401040409
> Content-Type: text/html; charset="utf-8"
> Content-Transfer-Encoding: 7bit
>
> <html>
>    <head>
>
>      <meta http-equiv="content-type" content="text/html; charset=utf-8">
>    </head>
>    <body bgcolor="#FFFFFF" text="#000000">
>      Hi everyone,<br>
>      <br>
>      I'm running on a small cluster that has slurm and mvapich2 version
>      2.1 installed.<br>
>      However, I'm seeing this error when I try to run a simple mpi
>      application.<br>
>      <blockquote><i>srun -n 2 --mpi=pmi2 ./nbody-mvapich2</i><i><br>
>        </i>
>        <i><br>
>        </i><i>
>          In: PMI_Abort(1, Fatal error in MPI_Init:</i><i><br>
>        </i><i>
>          Other MPI error, error stack:</i><i><br>
>        </i><i>
>          MPIR_Init_thread(514).......: </i><i><br>
>        </i><i>
>          MPID_Init(367)..............: channel initialization failed</i><i><br>
>        </i><i>
>          MPIDI_CH3_Init(492).........: </i><i><br>
>        </i><i>
>          MPIDI_CH3I_RDMA_init(224)...: </i><i><br>
>        </i><i>
>          rdma_setup_startup_ring(410): cannot create cq</i><i><br>
>        </i><i>
>          )</i><i><br>
>        </i><i>
>          In: PMI_Abort(1, Fatal error in MPI_Init:</i><i><br>
>        </i><i>
>          Other MPI error, error stack:</i><i><br>
>        </i><i>
>          MPIR_Init_thread(514).......: </i><i><br>
>        </i><i>
>          MPID_Init(367)..............: channel initialization failed</i><i><br>
>        </i><i>
>          MPIDI_CH3_Init(492).........: </i><i><br>
>        </i><i>
>          MPIDI_CH3I_RDMA_init(224)...: </i><i><br>
>        </i><i>
>          rdma_setup_startup_ring(410): cannot create cq</i><i><br>
>        </i><i>
>          )</i><i><br>
>        </i></blockquote>
>      <br>
>      <br>
>      I searched the internet and found this url (<a
>        moz-do-not-send="true"
> href="http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html"
>        target="_blank">http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html</a>)
>      on the "cannot create cq" issue, which suggests we need to set <br>
>      <pre>ulimit -l unlimited  in  /etc/sysconfig/slurm</pre>
>      <blockquote type="cite">
>        <pre>If it doesn't show unlimited (or some other number much higher than 64)
> then you'll need to do something to update the limits slurm is using.
> On redhat systems you can put the following in /etc/sysconfig/slurm.
>
>      ulimit -l unlimited
> </pre>
>      </blockquote>
>      So, I added that file with the "ulimit -l unlimited" statement
>      added.<br>
>      But, it didn't seem to make any difference on the issue.<br>
>      <br>
>      Does anyone have any hints on what might be wrong?<br>
>      <br>
>      Thank you,<br>
>      Jim G<br>
>      <br>
>      <br>
>      <br>
>      <br>
>    </body>
> </html>
>
> --------------020002060905080401040409--
>
> --===============6581539869262316634==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> --===============6581539869262316634==--



More information about the mvapich-discuss mailing list