[mvapich-discuss] Fwd: problem configuring mvapich2 with Slurm

Manuel Rodríguez Pascual manuel.rodriguez.pascual at gmail.com
Thu Nov 24 06:55:34 EST 2016


Hi all,

I am trying to make mvapich2 work with Slurm, but I keep having some
issues. I know there are quite a lot of threads on the subject, but none of
them seems to solve my problems.  My problem is that Slurm is executing two
serial jobs instead a single parallel one.

Below I have included quite a lot of information about how I have
configured my cluster and the different tests that I have performed, in
case that it helps.
---
---
COMPILATION

--- Slurm  17.02.0-0pre2:
./configure --prefix=/home/localsoft/slurm/

slurm.conf:
MpiDefault=none


--- MVAPICH mvapich2-2.2


After quite  a lot of different tests, I've been able to compile mvapich
with the following environment and options (config.log is attached to this
mail):

Environment vars:
LD_LIBRARY_PATH =
/usr/local/lib:/home/localsoft/slurm/lib:/home/localsoft/mvapich2/lib
(and some non related stuff)
MPICHLIB_LDFLAGS='-Wl,-rpath,/home/localsoft/slurm/lib
-Wl,-rpath,/home/localsoft/mvapich2/lib'

Compilation:

./configure --prefix=/home/localsoft/mvapich2 --disable-mcast
--with-slurm=/home/localsoft/slurm --with-pmi=pmi2 --with-pm=slurm
--disable-romio

Then, in every node of my cluster I have set LD_LIBRARY_PATH to the same
value.

My code is compiled with:
mpicc  helloWorldMPI.c -o helloWorldMPI -L/home/localsoft/slurm/lib


---
---
EXECUTION:

--serial jobs: OK
$ ./helloWorldMPI
Process 0 of 1 is on acme31.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1


$ srun   ./helloWorldMPI
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1

--parallel jobs: As you can see, slurm is executing two serial jobs instead
a single parallel one.

$ srun -n 2   ./helloWorldMPI
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1

Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1


$ srun -n 2 --tasks-per-node=1   ./helloWorldMPI
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1

Process 0 of 1 is on acme12.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1


--different Slurm MPI types:

$ srun -n 2 --mpi=none ./helloWorldMPI
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1

Goodbye world from process 0 of 1
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1


$ srun -n 2 --mpi=mvapich ./helloWorldMPI
Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1

Process 0 of 1 is on acme11.ciemat.es
Hello world from process 0 of 1
Goodbye world from process 0 of 1

$ srun -n 2 --mpi=pmi2 ./helloWorldMPI
srun: error: task 0 launch failed: Unspecified error
srun: error: task 1 launch failed: Unspecified error


---
---

Any clue on what's wrong?

Thanks for your help,


Manuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161124/8ae0aeb9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.log
Type: text/x-log
Size: 814829 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161124/8ae0aeb9/attachment-0001.bin>


More information about the mvapich-discuss mailing list