[mvapich-discuss] SCR+BLCR usage

Arjun J Rao rectangle.king at gmail.com
Tue Jan 28 07:19:02 EST 2014


After installing SLURM, I installed MVAPICH2 using the parameters
--enable-ckpt --with-scr --with-pm=no --with-pmi=slurm
Inserted the BLCR module into the kernel using insmod
Set the following variables in the /etc/scr.conf file in both the
environment and in the scr.conf file :
SCR_FLUSH=2
SCR_CACHE_BASE=/home/username/Cache
SCR_CNTL_BASE=/home/username/Control
SCR_HALT_SECONDS=3600
SCR_PREFIX=/home/username/Checkpoint
SCR_RUNS=3

In the scr.conf file, i have
CNTLDIR=/home/username/Control  BYTES=1000000000  [1 followed by 9 0s ie 1
GB]
SCR_CNTL_BASE=/home/username/Control

CACHEDIR=/home/arjun/Cache BYTES=12000000000 [12 followed by 9 0s ie 12 GB]
SCR_CACHE_BASE=/home/arjun/Cache

SCR_DB_ENABLE=0

I run my MPI jobs using
salloc -N2 -n4 bash (to create a job allocation of 2 nodes and 4 processes)
srun -N2 -n4 MPIExecutable


On Tue, Jan 28, 2014 at 10:45 AM, Raghu <rajachan at cse.ohio-state.edu> wrote:

> Hi Arjun,
>
> Thanks for your note. The capability to transparently checkpoint MPI
> applications launched using SLURM's 'srun' was added to
> MVAPICH2 in the latest release (2.0-beta). While it has been tested
> extensively with the BLCR, I have not tested all possible
> configurations with the BLCR+SLURM+SCR setup.  Can you send me
> your MVAPICH2 (and SCR) configuration parameters, and I'll try to see
> if I can reproduce your issue?
>
> Raghu
>
>
> On Mon, Jan 27, 2014 at 11:59 PM, Arjun J Rao <rectangle.king at gmail.com>
> wrote:
> > The manual mentions in passing in Section 6.15.2.2 (Transparent
> Multilevel
> > Checkpointing) that SCR+BLCR can be run with the same level of ease as
> > running BLCR alone. My job launcher is SLURM (the manual mentions using
> > mpirun_rsh or mpiexec as job launchers)
> >
> > Has anybody any experience using this ? I ask because while I am able to
> run
> > SCR alone on its own well, using the SCR installed with MVAPICH2 causes
> > weird errors :
> > "The variable SCR_CNTL_BASE cannot be set in the environment or
> > configuration file"
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140128/0b4927fb/attachment-0001.html>


More information about the mvapich-discuss mailing list