[mvapich-discuss] slurm+mvapich2 -> cannot create cq ?

Jonathan Perkins perkinjo at cse.ohio-state.edu
Thu Sep 6 13:30:13 EDT 2012


Hi, it looks like you checked ulimit -l outside of srun.  It's possible
that you're getting a lower memory limit because of the limits imposed
on slurm when the service started.

Can you run the following?

    [perkinjo at nowlab ~]$ srun -N 2 ulimit.sh
    test2: unlimited
    test1: unlimited
    [perkinjo at nowlab ~]$ cat ulimit.sh
    #!/bin/sh

    echo $(hostname): $(ulimit -l)

If it doesn't show unlimited (or some other number much higher than 64)
then you'll need to do something to update the limits slurm is using.
On redhat systems you can put the following in /etc/sysconfig/slurm.

    ulimit -l unlimited


On Thu, Sep 06, 2012 at 05:56:05PM +0300, Evren Yurtesen IB wrote:
> Hello,
> I am trying to run simple hello.c program with slurm+mvapich2.
> I have searched a lot but could not find the solution to my problem.
> The closest match was:
> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2008-December/002075.html
> and
> http://www.mail-archive.com/ewg@lists.openfabrics.org/msg11087.html
> but I already have the memlock limits set to unlimited...
> 
> Any ideas on what might be going wrong?
> 
> running on 2 processes on same node works fine...
> 
> 
> -bash-4.1$ mpich2version
> MVAPICH2 Version:     	1.8
> MVAPICH2 Release date:	Mon Apr 30 14:50:19 EDT 2012
> MVAPICH2 Device:      	ch3:mrail
> MVAPICH2 configure:
> 	--prefix=/export/modules/apps/mvapich2/1.8/gnu
> --with-valgrind=/export/modules/tools/valgrind/3.8.0/include/valgrind
> --enable-fast=O3 --enable-shared --enable-mpe --with-pmi=slurm
> --with-pm=no
> MVAPICH2 CC:  	gcc -O3 -march=corei7 -mtune=corei7   -O3
> MVAPICH2 CXX: 	c++ -O3 -march=corei7 -mtune=corei7  -O3
> MVAPICH2 F77: 	gfortran -O3 -march=corei7 -mtune=corei7  -O3
> MVAPICH2 FC:  	gfortran -O3 -march=corei7 -mtune=corei7  -O3
> 
> -bash-4.1$ mpicc hello.c -lpmi -lslurm
> 
> -bash-4.1$ srun -n 2  ./a.out
> srun: job 23458 queued and waiting for resources
> srun: job 23458 has been allocated resources
> Hello World from process 0 running on asg1
> Hello World from process 1 running on asg1
> Ready
> 
> -bash-4.1$ srun -N 2  ./a.out
> srun: job 23454 queued and waiting for resources
> srun: job 23454 has been allocated resources
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(408).......:
> MPID_Init(296)..............: channel initialization failed
> MPIDI_CH3_Init(283).........:
> MPIDI_CH3I_RDMA_init(172)...:
> rdma_setup_startup_ring(431): cannot create cq
> )
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(408).......:
> MPID_Init(296)..............: channel initialization failed
> MPIDI_CH3_Init(283).........:
> MPIDI_CH3I_RDMA_init(172)...:
> rdma_setup_startup_ring(431): cannot create cq
> )
> srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
> slurmd[asg1]: error: *** STEP 23454.0 KILLED AT 2012-09-06T17:52:06
> WITH SIGNAL 9 ***
> slurmd[asg2]: error: *** STEP 23454.0 KILLED AT 2012-09-06T17:52:06
> WITH SIGNAL 9 ***
> slurmd[asg1]: error: *** STEP 23454.0 KILLED AT 2012-09-06T17:52:06
> WITH SIGNAL 9 ***
> slurmd[asg2]: error: *** STEP 23454.0 KILLED AT 2012-09-06T17:52:06
> WITH SIGNAL 9 ***
> slurmd[asg2]: error: *** STEP 23454.0 KILLED AT 2012-09-06T17:52:06
> WITH SIGNAL 9 ***
> slurmd[asg1]: error: *** STEP 23454.0 KILLED AT 2012-09-06T17:52:06
> WITH SIGNAL 9 ***
> srun: error: asg1: task 0: Exited with exit code 1
> srun: Terminating job step 23454.0
> srun: error: asg2: task 1: Exited with exit code 1
> slurmd[asg1]: error: *** STEP 23454.0 KILLED AT 2012-09-06T17:52:06
> WITH SIGNAL 9 ***
> slurmd[asg2]: error: *** STEP 23454.0 KILLED AT 2012-09-06T17:52:06
> WITH SIGNAL 9 ***
> 
> -bash-4.1$ ulimit -l
> unlimited
> -bash-4.1$ ssh asg1 -C 'ulimit -l'
> unlimited
> -bash-4.1$ ssh asg2 -C 'ulimit -l'
> unlimited
> -bash-4.1$
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list