[mvapich-discuss] (no subject)

Jonathan Perkins perkinjo at cse.ohio-state.edu
Mon Nov 2 10:57:07 EST 2015


Hi Jim.  In addition to what Andy has suggested you may want to try adding
the following lines to /etc/security/limits.conf on all machines.
* soft memlock unlimited
* hard memlock unlimited

After this restart your sshd and slurm services.  This is related to the
following FAQ item in our userguide:
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-1380009.4.3

Please let us know if this helps.

On Mon, Nov 2, 2015 at 10:47 AM Andy Riebs <andy.riebs at hpe.com> wrote:

> X-Microsoft-Ex
> Hi Jim,
>
> I assume you did, but just in case... did you restart slurm on the
> compute nodes, as well?
>
> Andy
>
> On 11/02/2015 10:42 AM, Jim Galarowicz wrote:
> > Hi Andy,
> >
> > Thanks for the reply.
> >
> > I restarted slurm with this command:
> >
> > $ sudo /etc/init.d/slurm start
> > [sudo] password for jeg:
> > starting slurmctld:
> >
> > $ !sru
> > srun -n 2 --mpi=pmi2 ulimit.sh
> > ccn001.cc.nx: 64
> > ccn001.cc.nx: 64
> >
> > $  cat ulimit.sh
> > #!/bin/sh
> >     echo $(hostname): $(ulimit -l)
> >
> >
> > It looks like I'm still not getting ulimited on the compute nodes, but
> > when I do the salloc and do ulimit -l, I see unlimited.
> >
> > [jeg at hdn nbody]$ ulimit -l
> > unlimited
> >
> >
> > [jeg at hdn nbody]$ cat   /etc/sysconfig/slurm
> > ulimit -l unlimited
> >
> > Do you see anything wrong in what I'm doing?
> >
> > Thanks again for the reply!
> >
> > Jim G
> >
> > On 11/01/2015 02:41 PM, Andy Riebs wrote:
> >> Jim,
> >>
> >> Did you restart Slurm on the compute nodes after setting up
> >> /etc/sysconfig/slurm?
> >>
> >> Also, in your local job, what does "ulimit -l" show? That will get
> >> propagated to the computes.
> >>
> >> Andy
> >>
> >> On 11/01/2015 05:02 PM, Jim Galarowicz wrote:
> >>> X-MS-Exchange-CrossTenant
> >>> --===============6581539869262316634==
> >>> Content-Type: multipart/alternative;
> >>>     boundary="------------020002060905080401040409"
> >>>
> >>> --------------020002060905080401040409
> >>> Content-Type: text/plain; charset="utf-8"; format=flowed
> >>> Content-Transfer-Encoding: 7bit
> >>>
> >>> Hi everyone,
> >>>
> >>> I'm running on a small cluster that has slurm and mvapich2 version 2.1
> >>> installed.
> >>> However, I'm seeing this error when I try to run a simple mpi
> >>> application.
> >>>
> >>>      /srun -n 2 --mpi=pmi2 ./nbody-mvapich2//
> >>>      / /
> >>>      //In: PMI_Abort(1, Fatal error in MPI_Init://
> >>>      //Other MPI error, error stack://
> >>>      //MPIR_Init_thread(514).......: //
> >>>      //MPID_Init(367)..............: channel initialization failed//
> >>>      //MPIDI_CH3_Init(492).........: //
> >>>      //MPIDI_CH3I_RDMA_init(224)...: //
> >>>      //rdma_setup_startup_ring(410): cannot create cq//
> >>>      //)//
> >>>      //In: PMI_Abort(1, Fatal error in MPI_Init://
> >>>      //Other MPI error, error stack://
> >>>      //MPIR_Init_thread(514).......: //
> >>>      //MPID_Init(367)..............: channel initialization failed//
> >>>      //MPIDI_CH3_Init(492).........: //
> >>>      //MPIDI_CH3I_RDMA_init(224)...: //
> >>>      //rdma_setup_startup_ring(410): cannot create cq//
> >>>      //)//
> >>>      /
> >>>
> >>>
> >>>
> >>> I searched the internet and found this url
> >>> (
> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html
> )
> >>>
> >>> on the "cannot create cq" issue, which suggests we need to set
> >>>
> >>> ulimit -l unlimited  in  /etc/sysconfig/slurm
> >>>
> >>>> If it doesn't show unlimited (or some other number much higher than
> >>>> 64)
> >>>> then you'll need to do something to update the limits slurm is using.
> >>>> On redhat systems you can put the following in /etc/sysconfig/slurm.
> >>>>
> >>>>       ulimit -l unlimited
> >>> So, I added that file with the "ulimit -l unlimited" statement added.
> >>> But, it didn't seem to make any difference on the issue.
> >>>
> >>> Does anyone have any hints on what might be wrong?
> >>>
> >>> Thank you,
> >>> Jim G
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --------------020002060905080401040409
> >>> Content-Type: text/html; charset="utf-8"
> >>> Content-Transfer-Encoding: 7bit
> >>>
> >>> <html>
> >>>    <head>
> >>>
> >>>      <meta http-equiv="content-type" content="text/html;
> >>> charset=utf-8">
> >>>    </head>
> >>>    <body bgcolor="#FFFFFF" text="#000000">
> >>>      Hi everyone,<br>
> >>>      <br>
> >>>      I'm running on a small cluster that has slurm and mvapich2 version
> >>>      2.1 installed.<br>
> >>>      However, I'm seeing this error when I try to run a simple mpi
> >>>      application.<br>
> >>>      <blockquote><i>srun -n 2 --mpi=pmi2 ./nbody-mvapich2</i><i><br>
> >>>        </i>
> >>>        <i><br>
> >>>        </i><i>
> >>>          In: PMI_Abort(1, Fatal error in MPI_Init:</i><i><br>
> >>>        </i><i>
> >>>          Other MPI error, error stack:</i><i><br>
> >>>        </i><i>
> >>>          MPIR_Init_thread(514).......: </i><i><br>
> >>>        </i><i>
> >>>          MPID_Init(367)..............: channel initialization
> >>> failed</i><i><br>
> >>>        </i><i>
> >>>          MPIDI_CH3_Init(492).........: </i><i><br>
> >>>        </i><i>
> >>>          MPIDI_CH3I_RDMA_init(224)...: </i><i><br>
> >>>        </i><i>
> >>>          rdma_setup_startup_ring(410): cannot create cq</i><i><br>
> >>>        </i><i>
> >>>          )</i><i><br>
> >>>        </i><i>
> >>>          In: PMI_Abort(1, Fatal error in MPI_Init:</i><i><br>
> >>>        </i><i>
> >>>          Other MPI error, error stack:</i><i><br>
> >>>        </i><i>
> >>>          MPIR_Init_thread(514).......: </i><i><br>
> >>>        </i><i>
> >>>          MPID_Init(367)..............: channel initialization
> >>> failed</i><i><br>
> >>>        </i><i>
> >>>          MPIDI_CH3_Init(492).........: </i><i><br>
> >>>        </i><i>
> >>>          MPIDI_CH3I_RDMA_init(224)...: </i><i><br>
> >>>        </i><i>
> >>>          rdma_setup_startup_ring(410): cannot create cq</i><i><br>
> >>>        </i><i>
> >>>          )</i><i><br>
> >>>        </i></blockquote>
> >>>      <br>
> >>>      <br>
> >>>      I searched the internet and found this url (<a
> >>>        moz-do-not-send="true"
> >>> href="
> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html
> "
> >>>
> >>> target="_blank">
> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html
> </a>)
> >>>
> >>>      on the "cannot create cq" issue, which suggests we need to set
> >>> <br>
> >>>      <pre>ulimit -l unlimited  in /etc/sysconfig/slurm</pre>
> >>>      <blockquote type="cite">
> >>>        <pre>If it doesn't show unlimited (or some other number much
> >>> higher than 64)
> >>> then you'll need to do something to update the limits slurm is using.
> >>> On redhat systems you can put the following in /etc/sysconfig/slurm.
> >>>
> >>>      ulimit -l unlimited
> >>> </pre>
> >>>      </blockquote>
> >>>      So, I added that file with the "ulimit -l unlimited" statement
> >>>      added.<br>
> >>>      But, it didn't seem to make any difference on the issue.<br>
> >>>      <br>
> >>>      Does anyone have any hints on what might be wrong?<br>
> >>>      <br>
> >>>      Thank you,<br>
> >>>      Jim G<br>
> >>>      <br>
> >>>      <br>
> >>>      <br>
> >>>      <br>
> >>>    </body>
> >>> </html>
> >>>
> >>> --------------020002060905080401040409--
> >>>
> >>> --===============6581539869262316634==
> >>> Content-Type: text/plain; charset="us-ascii"
> >>> MIME-Version: 1.0
> >>> Content-Transfer-Encoding: 7bit
> >>> Content-Disposition: inline
> >>>
> >>> _______________________________________________
> >>> mvapich-discuss mailing list
> >>> mvapich-discuss at cse.ohio-state.edu
> >>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>
> >>> --===============6581539869262316634==--
> >>
> >
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151102/13197cd1/attachment-0001.html>


More information about the mvapich-discuss mailing list