[mvapich-discuss] mvapich2+slurm+blcr Oh, My!
Dhabaleswar Panda
panda at cse.ohio-state.edu
Thu Jun 24 00:07:08 EDT 2010
Hi,
Thanks for reporting this issue. This is to let you know that the
following combinations (in addition to other combinations like hydra,
etc.) have been tested:
- mvapich2 + slurm
- mvapich2 + mpirun_rsh
- mvapich2 + mpirun_rsh + blcr
However, the combination you are trying to use (mvapich2 + slurm + blcr)
has not been tested. We will take a look at it.
In the mean time, I will suggest you to try the `mvapich2 + mpirun_rsh +
blcr' combination.
Sections 5.2.1 and 6.3 of the latest MVAPICH2 1.5 user guide mentions
about the detailed steps in using mpirun_rsh and check-point restart with
BLCR, respectively.
The User Guide can be obtained from the following URL:
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.5rc2.html
Thanks,
DK
On Wed, 23 Jun 2010, David Brown wrote:
> So I'm not sure who to ask this question to and what setup I'm doing
> wrong, but something isn't working right.
>
> I built mvapich2 1.5rc2 this way:
>
> ./configure \
> --prefix=%{mpidir} \
> --mandir=%{mpidir}/man \
> --enable-error-checking=runtime \
> --enable-timing=none \
> --enable-g=mem,dbg,meminit \
> --enable-sharedlibs=gcc \
> --with-rdma=gen2 \
> --enable-romio \
> --with-file-system=lustre+nfs \
> --with-slurm=/usr \
> --with-pmi=slurm \
> --with-pm=no \
> --enable-threads=multiple \
> --with-thread-package=pthreads \
> --disable-mpe \
> --without-mpe \
> --disable-nmpi-as-mpi \
> --enable-f77 \
> --enable-f90 \
> --enable-cxx \
> --enable-blcr
>
> I built slurm this way:
>
> rpmbuild -ta --with blcr --with postgresql slurm-2.1.9.tar.bz2
>
> I configured slurm with:
>
> CheckpointType=checkpoint/blcr
>
> And I built IOR and launched it this way:
>
> $ srun --checkpoint-dir=/lustre -n 8 -N 4 ./IOR -i 4 -b 32g -T 600 -E
> -k -e -t 1m -o /lustre/testFile
> [Rank 1][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
> [Rank 3][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
> [Rank 0][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
> [Rank 2][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
> [Rank 5][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
> [Rank 4][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
> [Rank 7][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
> [Rank 6][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
> srun: error: x11: tasks 2-3: Exited with exit code 255
> srun: error: x10: tasks 0-1: Exited with exit code 255
> srun: error: x12: tasks 4-5: Exited with exit code 255
> srun: error: x13: tasks 6-7: Exited with exit code 255
>
> So I'm confused, this is obviously not working. Do I need to use some
> other mechanism to launch? am I missing configuration somewhere?
>
> Thanks,
> - David Brown
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
More information about the mvapich-discuss
mailing list