[mvapich-discuss] Fwd: problem configuring mvapich2 with Slurm

Sourav Chakraborty chakraborty.52 at buckeyemail.osu.edu
Thu Nov 24 10:50:25 EST 2016


Hi Manuel,

Thanks for reporting the issue.

Since Mvapich2 was configured with pmi2, the following way to launch jobs
is correct:
srun -n 2 --mpi=pmi2 ./helloMPI

Can you please post the output of the following command? This will have
more information to identify the issue.
srun -n 2 --mpi=pmi2 --slurmd-debug=5 ./helloMPI

Also, to identify if this is an Mvapich2 specific issue, can you please try
running the following command?
srun -n 2 --mpi=pmi2 hostname

Thanks,
Sourav


On Thu, Nov 24, 2016 at 6:55 AM, Manuel Rodríguez Pascual <
manuel.rodriguez.pascual at gmail.com> wrote:

> Hi all,
>
> I am trying to make mvapich2 work with Slurm, but I keep having some
> issues. I know there are quite a lot of threads on the subject, but none of
> them seems to solve my problems.  My problem is that Slurm is executing two
> serial jobs instead a single parallel one.
>
> Below I have included quite a lot of information about how I have
> configured my cluster and the different tests that I have performed, in
> case that it helps.
> ---
> ---
> COMPILATION
>
> --- Slurm  17.02.0-0pre2:
> ./configure --prefix=/home/localsoft/slurm/
>
> slurm.conf:
> MpiDefault=none
>
>
> --- MVAPICH mvapich2-2.2
>
>
> After quite  a lot of different tests, I've been able to compile mvapich
> with the following environment and options (config.log is attached to this
> mail):
>
> Environment vars:
> LD_LIBRARY_PATH = /usr/local/lib:/home/localsoft
> /slurm/lib:/home/localsoft/mvapich2/lib (and some non related stuff)
> MPICHLIB_LDFLAGS='-Wl,-rpath,/home/localsoft/slurm/lib
> -Wl,-rpath,/home/localsoft/mvapich2/lib'
>
> Compilation:
>
> ./configure --prefix=/home/localsoft/mvapich2 --disable-mcast
> --with-slurm=/home/localsoft/slurm --with-pmi=pmi2 --with-pm=slurm
> --disable-romio
>
> Then, in every node of my cluster I have set LD_LIBRARY_PATH to the same
> value.
>
> My code is compiled with:
> mpicc  helloWorldMPI.c -o helloWorldMPI -L/home/localsoft/slurm/lib
>
>
> ---
> ---
> EXECUTION:
>
> --serial jobs: OK
> $ ./helloWorldMPI
> Process 0 of 1 is on acme31.ciemat.es
> Hello world from process 0 of 1
> Goodbye world from process 0 of 1
>
>
> $ srun   ./helloWorldMPI
> Process 0 of 1 is on acme11.ciemat.es
> Hello world from process 0 of 1
> Goodbye world from process 0 of 1
>
> --parallel jobs: As you can see, slurm is executing two serial jobs
> instead a single parallel one.
>
> $ srun -n 2   ./helloWorldMPI
> Process 0 of 1 is on acme11.ciemat.es
> Hello world from process 0 of 1
> Goodbye world from process 0 of 1
>
> Process 0 of 1 is on acme11.ciemat.es
> Hello world from process 0 of 1
> Goodbye world from process 0 of 1
>
>
> $ srun -n 2 --tasks-per-node=1   ./helloWorldMPI
> Process 0 of 1 is on acme11.ciemat.es
> Hello world from process 0 of 1
> Goodbye world from process 0 of 1
>
> Process 0 of 1 is on acme12.ciemat.es
> Hello world from process 0 of 1
> Goodbye world from process 0 of 1
>
>
> --different Slurm MPI types:
>
> $ srun -n 2 --mpi=none ./helloWorldMPI
> Process 0 of 1 is on acme11.ciemat.es
> Hello world from process 0 of 1
>
> Goodbye world from process 0 of 1
> Process 0 of 1 is on acme11.ciemat.es
> Hello world from process 0 of 1
> Goodbye world from process 0 of 1
>
>
> $ srun -n 2 --mpi=mvapich ./helloWorldMPI
> Process 0 of 1 is on acme11.ciemat.es
> Hello world from process 0 of 1
> Goodbye world from process 0 of 1
>
> Process 0 of 1 is on acme11.ciemat.es
> Hello world from process 0 of 1
> Goodbye world from process 0 of 1
>
> $ srun -n 2 --mpi=pmi2 ./helloWorldMPI
> srun: error: task 0 launch failed: Unspecified error
> srun: error: task 1 launch failed: Unspecified error
>
>
> ---
> ---
>
> Any clue on what's wrong?
>
> Thanks for your help,
>
>
> Manuel
>
>
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161124/b2b892d8/attachment.html>


More information about the mvapich-discuss mailing list