[mvapich-discuss] Issues running mvapich2 with slurm

Matthew Russell matthew.g.russell at gmail.com
Mon Oct 22 14:26:12 EDT 2012


> > ./configure --prefix=/cm/shared/apps/mvapich2/pgi/64/1.8
> --with-pmi=slurm \
> >    --with-pm=no CPPFLAGS=-I/cm/shared/apps/slurm/2.4.2/slurm-2.4.2/ \
> >    LDFLAGS=-L/cm/shared/apps/slurm/2.4.2/lib/
>
>


> For more debugging information you may want to rebuilding mvapich2 with
> the addition of `--enable-g=dbg --disable-fast' to the configure line.


Oh ya, I actually built that version late Friday night but forgot, I'll try
re-running with that version.


> > When I try to run apps though, I get:
> > [matt at dena]~/cluster_tests% srun -n16 --mpi=none hello_mvapich2_slurm
> > srun: error: Unable to confirm allocation for job 14: Invalid job id
> > specified
> > srun: Check SLURM_JOB_ID environment variable for expired or invalid job.
>
> I believe the above issue is related to slurm.  I can help you with the
> issue you have noted below.


Hmm, though I was confident it was set to unlimited, I think you're right
and it's not:

[matt at dena]~/cluster_tests%  srun -N 2 ulimit.sh
srun: error: Unable to confirm allocation for job 14: Invalid job id
specified
srun: Check SLURM_JOB_ID environment variable for expired or invalid job.
[matt at dena]~/cluster_tests% salloc -N 2
salloc: Granted job allocation 19
[matt at dena]~/cluster_tests% *srun -N 2 ulimit.sh *
*dena2: 64*
*dena1: 64*
[matt at dena]~/cluster_tests%

I'll read the links you sent me, and see if I can put the "ulimit -l
unlimited" into sysconfig/slurm as you suggested.

Thanks!




> One thing that you may want to check is that `ulimit -l' returns
> unlimited (or some other value much higher than 64) on each host when
> using slurm.
>
>     [perkinjo at nowlab ~]$ srun -N 2 ulimit.sh
>     test2: unlimited
>     test1: unlimited
>     [perkinjo at nowlab ~]$ cat ulimit.sh
>     #!/bin/sh
>
>     echo $(hostname): $(ulimit -l)
>
> If the output is not unlimitd you will probably have a cq creation
> failure.  Take a look at the following section of our userguide.  You're
> also using slurm so I'm posting a link to their faq as well.
>
>
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.8.html#x1-125000+9.4.3
> https://computing.llnl.gov/linux/slurm/faq.html#memlock
>
> Basically you'll want to make sure memlock is set to unlimited in
> /etc/security/limits.conf and that slurm is respecting this as well.  On
> our systems we have added `ulimit -l unlimited' into
> /etc/sysconfig/slurm (redhat systems).
>
> Hope this info helps.
>
> --
> Jonathan Perkins
> http://www.cse.ohio-state.edu/~perkinjo
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20121022/0479b6b2/attachment-0001.html


More information about the mvapich-discuss mailing list