[mvapich-discuss] SCR+BLCR usage

Arjun J Rao rectangle.king at gmail.com
Tue Jan 28 07:24:40 EST 2014


I issue my srun commands as :
srun -N2 -n4 --checkpoint 1 --checkpoint-dir /home/username/Checkpoint
MPIExecutable


On Tue, Jan 28, 2014 at 5:49 PM, Arjun J Rao <rectangle.king at gmail.com>wrote:

> After installing SLURM, I installed MVAPICH2 using the parameters
> --enable-ckpt --with-scr --with-pm=no --with-pmi=slurm
> Inserted the BLCR module into the kernel using insmod
> Set the following variables in the /etc/scr.conf file in both the
> environment and in the scr.conf file :
> SCR_FLUSH=2
> SCR_CACHE_BASE=/home/username/Cache
> SCR_CNTL_BASE=/home/username/Control
> SCR_HALT_SECONDS=3600
> SCR_PREFIX=/home/username/Checkpoint
> SCR_RUNS=3
>
> In the scr.conf file, i have
> CNTLDIR=/home/username/Control  BYTES=1000000000  [1 followed by 9 0s ie 1
> GB]
> SCR_CNTL_BASE=/home/username/Control
>
> CACHEDIR=/home/arjun/Cache BYTES=12000000000 [12 followed by 9 0s ie 12 GB]
> SCR_CACHE_BASE=/home/arjun/Cache
>
> SCR_DB_ENABLE=0
>
> I run my MPI jobs using
> salloc -N2 -n4 bash (to create a job allocation of 2 nodes and 4 processes)
> srun -N2 -n4 MPIExecutable
>
>
> On Tue, Jan 28, 2014 at 10:45 AM, Raghu <rajachan at cse.ohio-state.edu>wrote:
>
>> Hi Arjun,
>>
>> Thanks for your note. The capability to transparently checkpoint MPI
>> applications launched using SLURM's 'srun' was added to
>> MVAPICH2 in the latest release (2.0-beta). While it has been tested
>> extensively with the BLCR, I have not tested all possible
>> configurations with the BLCR+SLURM+SCR setup.  Can you send me
>> your MVAPICH2 (and SCR) configuration parameters, and I'll try to see
>> if I can reproduce your issue?
>>
>> Raghu
>>
>>
>> On Mon, Jan 27, 2014 at 11:59 PM, Arjun J Rao <rectangle.king at gmail.com>
>> wrote:
>> > The manual mentions in passing in Section 6.15.2.2 (Transparent
>> Multilevel
>> > Checkpointing) that SCR+BLCR can be run with the same level of ease as
>> > running BLCR alone. My job launcher is SLURM (the manual mentions using
>> > mpirun_rsh or mpiexec as job launchers)
>> >
>> > Has anybody any experience using this ? I ask because while I am able
>> to run
>> > SCR alone on its own well, using the SCR installed with MVAPICH2 causes
>> > weird errors :
>> > "The variable SCR_CNTL_BASE cannot be set in the environment or
>> > configuration file"
>> >
>> > _______________________________________________
>> > mvapich-discuss mailing list
>> > mvapich-discuss at cse.ohio-state.edu
>> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140128/08aa2115/attachment.html>


More information about the mvapich-discuss mailing list