[mvapich-discuss] mvapich2+slurm+blcr Oh, My! (fwd)

David Brown dmlb2000 at gmail.com
Thu Jun 24 13:58:35 EDT 2010


Thanks, I rebuilt mvapich2 to use mpirun_rsh and now its running fine.

Question about the environment variables.

MV2_CKPT_INTERVAL is this in minutes? seconds? hours?

Also, what would need to be done to get this working? Is slurm
supposed to setup the controlling daemon and open all the ports to the
processes for communication?

Just curious, Thanks.
 - David Brown

On Wed, Jun 23, 2010 at 9:29 PM, Dhabaleswar Panda
<panda at cse.ohio-state.edu> wrote:
> Hi,
>
> Thanks for reporting this issue. This is to let you know that the
> following combinations (in addition to other combinations like hydra,
> etc.) have been tested:
>
>   - mvapich2 + slurm
>   - mvapich2 + mpirun_rsh
>   - mvapich2 + mpirun_rsh + blcr
>
> However, the combination you are trying to use (mvapich2 + slurm + blcr)
> has not been tested. We will take a look at it.
>
> In the mean time, I will suggest you to try the `mvapich2 + mpirun_rsh +
> blcr' combination.
>
> Sections 5.2.1 and 6.3 of the latest MVAPICH2 1.5 user guide mentions
> about the detailed steps in using mpirun_rsh and check-point restart with
> BLCR, respectively.
>
> The User Guide can be obtained from the following URL:
>
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.5rc2.html
>
> Thanks,
>
> DK
>
>
> On Wed, 23 Jun 2010, David Brown wrote:
>
>> So I'm not sure who to ask this question to and what setup I'm doing
>> wrong, but something isn't working right.
>>
>> I built mvapich2 1.5rc2 this way:
>>
>> ./configure \
>>                 --prefix=%{mpidir} \
>>                 --mandir=%{mpidir}/man \
>>                 --enable-error-checking=runtime \
>>                 --enable-timing=none \
>>                 --enable-g=mem,dbg,meminit \
>>                 --enable-sharedlibs=gcc \
>>                 --with-rdma=gen2 \
>>                 --enable-romio \
>>                 --with-file-system=lustre+nfs \
>>                 --with-slurm=/usr \
>>                 --with-pmi=slurm \
>>                 --with-pm=no \
>>                 --enable-threads=multiple \
>>                 --with-thread-package=pthreads \
>>                 --disable-mpe \
>>                 --without-mpe \
>>                 --disable-nmpi-as-mpi \
>>                 --enable-f77 \
>>                 --enable-f90 \
>>                 --enable-cxx \
>>                 --enable-blcr
>>
>> I built slurm this way:
>>
>> rpmbuild -ta --with blcr --with postgresql slurm-2.1.9.tar.bz2
>>
>> I configured slurm with:
>>
>> CheckpointType=checkpoint/blcr
>>
>> And I built IOR and launched it this way:
>>
>> $ srun --checkpoint-dir=/lustre -n 8 -N 4  ./IOR -i 4 -b 32g -T 600 -E
>> -k -e -t 1m -o /lustre/testFile
>> [Rank 1][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
>> [Rank 3][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
>> [Rank 0][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
>> [Rank 2][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
>> [Rank 5][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
>> [Rank 4][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
>> [Rank 7][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
>> [Rank 6][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
>> srun: error: x11: tasks 2-3: Exited with exit code 255
>> srun: error: x10: tasks 0-1: Exited with exit code 255
>> srun: error: x12: tasks 4-5: Exited with exit code 255
>> srun: error: x13: tasks 6-7: Exited with exit code 255
>>
>> So I'm confused, this is obviously not working. Do I need to use some
>> other mechanism to launch? am I missing configuration somewhere?
>>
>> Thanks,
>> - David Brown
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



More information about the mvapich-discuss mailing list