[mvapich-discuss] (no subject)
Andy Riebs
andy.riebs at hpe.com
Sun Nov 1 17:41:33 EST 2015
Jim,
Did you restart Slurm on the compute nodes after setting up
/etc/sysconfig/slurm?
Also, in your local job, what does "ulimit -l" show? That will get
propagated to the computes.
Andy
On 11/01/2015 05:02 PM, Jim Galarowicz wrote:
> X-MS-Exchange-CrossTenant
> --===============6581539869262316634==
> Content-Type: multipart/alternative;
> boundary="------------020002060905080401040409"
>
> --------------020002060905080401040409
> Content-Type: text/plain; charset="utf-8"; format=flowed
> Content-Transfer-Encoding: 7bit
>
> Hi everyone,
>
> I'm running on a small cluster that has slurm and mvapich2 version 2.1
> installed.
> However, I'm seeing this error when I try to run a simple mpi application.
>
> /srun -n 2 --mpi=pmi2 ./nbody-mvapich2//
> / /
> //In: PMI_Abort(1, Fatal error in MPI_Init://
> //Other MPI error, error stack://
> //MPIR_Init_thread(514).......: //
> //MPID_Init(367)..............: channel initialization failed//
> //MPIDI_CH3_Init(492).........: //
> //MPIDI_CH3I_RDMA_init(224)...: //
> //rdma_setup_startup_ring(410): cannot create cq//
> //)//
> //In: PMI_Abort(1, Fatal error in MPI_Init://
> //Other MPI error, error stack://
> //MPIR_Init_thread(514).......: //
> //MPID_Init(367)..............: channel initialization failed//
> //MPIDI_CH3_Init(492).........: //
> //MPIDI_CH3I_RDMA_init(224)...: //
> //rdma_setup_startup_ring(410): cannot create cq//
> //)//
> /
>
>
>
> I searched the internet and found this url
> (http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html)
> on the "cannot create cq" issue, which suggests we need to set
>
> ulimit -l unlimited in /etc/sysconfig/slurm
>
>> If it doesn't show unlimited (or some other number much higher than 64)
>> then you'll need to do something to update the limits slurm is using.
>> On redhat systems you can put the following in /etc/sysconfig/slurm.
>>
>> ulimit -l unlimited
> So, I added that file with the "ulimit -l unlimited" statement added.
> But, it didn't seem to make any difference on the issue.
>
> Does anyone have any hints on what might be wrong?
>
> Thank you,
> Jim G
>
>
>
>
>
> --------------020002060905080401040409
> Content-Type: text/html; charset="utf-8"
> Content-Transfer-Encoding: 7bit
>
> <html>
> <head>
>
> <meta http-equiv="content-type" content="text/html; charset=utf-8">
> </head>
> <body bgcolor="#FFFFFF" text="#000000">
> Hi everyone,<br>
> <br>
> I'm running on a small cluster that has slurm and mvapich2 version
> 2.1 installed.<br>
> However, I'm seeing this error when I try to run a simple mpi
> application.<br>
> <blockquote><i>srun -n 2 --mpi=pmi2 ./nbody-mvapich2</i><i><br>
> </i>
> <i><br>
> </i><i>
> In: PMI_Abort(1, Fatal error in MPI_Init:</i><i><br>
> </i><i>
> Other MPI error, error stack:</i><i><br>
> </i><i>
> MPIR_Init_thread(514).......: </i><i><br>
> </i><i>
> MPID_Init(367)..............: channel initialization failed</i><i><br>
> </i><i>
> MPIDI_CH3_Init(492).........: </i><i><br>
> </i><i>
> MPIDI_CH3I_RDMA_init(224)...: </i><i><br>
> </i><i>
> rdma_setup_startup_ring(410): cannot create cq</i><i><br>
> </i><i>
> )</i><i><br>
> </i><i>
> In: PMI_Abort(1, Fatal error in MPI_Init:</i><i><br>
> </i><i>
> Other MPI error, error stack:</i><i><br>
> </i><i>
> MPIR_Init_thread(514).......: </i><i><br>
> </i><i>
> MPID_Init(367)..............: channel initialization failed</i><i><br>
> </i><i>
> MPIDI_CH3_Init(492).........: </i><i><br>
> </i><i>
> MPIDI_CH3I_RDMA_init(224)...: </i><i><br>
> </i><i>
> rdma_setup_startup_ring(410): cannot create cq</i><i><br>
> </i><i>
> )</i><i><br>
> </i></blockquote>
> <br>
> <br>
> I searched the internet and found this url (<a
> moz-do-not-send="true"
> href="http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html"
> target="_blank">http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-September/004027.html</a>)
> on the "cannot create cq" issue, which suggests we need to set <br>
> <pre>ulimit -l unlimited in /etc/sysconfig/slurm</pre>
> <blockquote type="cite">
> <pre>If it doesn't show unlimited (or some other number much higher than 64)
> then you'll need to do something to update the limits slurm is using.
> On redhat systems you can put the following in /etc/sysconfig/slurm.
>
> ulimit -l unlimited
> </pre>
> </blockquote>
> So, I added that file with the "ulimit -l unlimited" statement
> added.<br>
> But, it didn't seem to make any difference on the issue.<br>
> <br>
> Does anyone have any hints on what might be wrong?<br>
> <br>
> Thank you,<br>
> Jim G<br>
> <br>
> <br>
> <br>
> <br>
> </body>
> </html>
>
> --------------020002060905080401040409--
>
> --===============6581539869262316634==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> --===============6581539869262316634==--
More information about the mvapich-discuss
mailing list