[mvapich-discuss] Fwd: problem configuring mvapich2 with Slurm
Manuel Rodríguez Pascual
manuel.rodriguez.pascual at gmail.com
Thu Nov 24 06:55:34 EST 2016
Hi all,
I am trying to make mvapich2 work with Slurm, but I keep having some
issues. I know there are quite a lot of threads on the subject, but none of
them seems to solve my problems. My problem is that Slurm is executing two
serial jobs instead a single parallel one.
Below I have included quite a lot of information about how I have
configured my cluster and the different tests that I have performed, in
case that it helps.
---
---
COMPILATION
--- Slurm 17.02.0-0pre2:
./configure --prefix=/home/localsoft/slurm/
slurm.conf:
MpiDefault=none
--- MVAPICH mvapich2-2.2
After quite a lot of different tests, I've been able to compile mvapich
with the following environment and options (config.log is attached to this
mail):
Environment vars:
LD_LIBRARY_PATH =
/usr/local/lib:/home/localsoft/slurm/lib:/home/localsoft/mvapich2/lib
(and some non related stuff)
MPICHLIB_LDFLAGS='-Wl,-rpath,/home/localsoft/slurm/lib
-Wl,-rpath,/home/localsoft/mvapich2/lib'
Compilation:
./configure --prefix=/home/localsoft/mvapich2 --disable-mcast
--with-slurm=/home/localsoft/slurm --with-pmi=pmi2 --with-pm=slurm
--disable-romio
Then, in every node of my cluster I have set LD_LIBRARY_PATH to the same
value.
My code is compiled with:
mpicc helloWorldMPI.c -o helloWorldMPI -L/home/localsoft/slurm/lib
---
---
EXECUTION:
--serial jobs: OK
$ ./helloWorldMPI
Process 0 of 1 is on acme31.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1
$ srun ./helloWorldMPI
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1
--parallel jobs: As you can see, slurm is executing two serial jobs instead
a single parallel one.
$ srun -n 2 ./helloWorldMPI
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1
$ srun -n 2 --tasks-per-node=1 ./helloWorldMPI
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1
Process 0 of 1 is on acme12.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1
--different Slurm MPI types:
$ srun -n 2 --mpi=none ./helloWorldMPI
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1
$ srun -n 2 --mpi=mvapich ./helloWorldMPI
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1
$ srun -n 2 --mpi=pmi2 ./helloWorldMPI
srun: error: task 0 launch failed: Unspecified error
srun: error: task 1 launch failed: Unspecified error
---
---
Any clue on what's wrong?
Thanks for your help,
Manuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161124/8ae0aeb9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.log
Type: text/x-log
Size: 814829 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161124/8ae0aeb9/attachment-0001.bin>
More information about the mvapich-discuss
mailing list