[mvapich-discuss] Help! Problems Slurm and MVAPICH2

Jonathan Perkins perkinjo at cse.ohio-state.edu
Thu Sep 13 07:59:12 EDT 2012


On Thu, Sep 13, 2012 at 09:58:59AM +0200, José Manuel Molero wrote:
> Hello,

Hi, my reply is inline.

> We have a new cluster with an Infiniband network, and I think that Slurm and MVAPICH2 would be the best option in this case.
> 
> I have configured SLURM 2.3.2 on Ubuntu Server and its works.
> 
> Now I tried to install MVAPICH2 1.8, with the following:
> 
> ./configure --with-pm=none --with-pmi=slurm ;  make ; make install  (in the front end and all the compute nodes)

Looks good so far.

> 
> But it dosent work.
> 
> I compile using :
> 
> mpicc file.c -o file -lpmi -L/usr/include/slurm/

This step should be unnecessary.  Try just using:

    mpicc file.c -o file

> 
> and then:
> 
>  srun -N2 file
> 
> And the result is:
> 
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error
> )
> In: PMI_Abort(1, Fatal error in MPI_Init:
> Other MPI error
> )
> srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
> srun: error: node17: task 1: Exited with exit code 1
> slurmd[node16]: *** STEP 102.0 KILLED AT 2012-09-13T09:56:18 WITH SIGNAL 9 ***
> srun: error: node16: task 0: Exited with exit code 1
> slurmd[node16]: *** STEP 102.0 KILLED AT 2012-09-13T09:56:18 WITH SIGNAL 9 ***
> 
> 
> 
> And the output of mpiname -a
> 
> MVAPICH2 1.8 Mon Apr 30 14:56:40 EDT 2012 ch3:mrail
> 
> Compilation
> CC: gcc    -DNDEBUG -DNVALGRIND -O2
> CXX: c++   -DNDEBUG -DNVALGRIND -O2
> F77: gfortran   -O2 
> FC: gfortran   -O2
> 
> Configuration
> --with-pm=none --with-pmi=slurm
> 
> 
> 
> What I'm doing wrong?

I think the only thing that is getting tripped up is the direct linking
to slurms pmi library.  Let us know how it goes when you try the command
without those linking options.

Another thing that you may want to check is that `ulimit -l' returns
unlimited (or some other value much higher than 64) on each host when
using slurm.

    [perkinjo at nowlab ~]$ srun -N 2 ulimit.sh
    test2: unlimited
    test1: unlimited
    [perkinjo at nowlab ~]$ cat ulimit.sh
    #!/bin/sh

    echo $(hostname): $(ulimit -l)

For more debugging information you may want to rebuilding mvapich2 with
the addition of `--enable-g=dbg --disable-fast' to the configure line.
Hope this info helps.

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list