[mvapich-discuss] SLURM + SCR + BLCR with MVAPICH2.0b

Arjun J Rao rectangle.king at gmail.com
Tue Dec 24 01:25:19 EST 2013


I first installed BLCR and SLURM. Then I installed MVAPICH2 with the
following options :
./configure --enable-ckpt --with-scr --with-pm=no --with-pmi=slurm
My intention is to be able to hand over the checkpoints created by BLCR
over to SCR to give a high checkpoint-writing throughput. However, taking a
simple MPI program and compiling using mpicc and then running using srun
yields the following errors.

srun -N2 -n12 MPIExecutable

SCR v1.1-8 ABORT : rank 1 on machine2: Failed to create store descriptor
for control directory @
src/mpid/ch3/channels/common/src/scr/scr_storedesc.c:299
In: PMI_Abort(0,application called MPI_Abort(MPI_COMM_WORLD,0) - process 1)
SCR v1.1-8 ABORT : rank 0 on machine1: Failed to create store descriptor
for control directory @
src/mpid/ch3/channels/common/src/scr/scr_storedesc.c:299
In: PMI_Abort(0,application called MPI_Abort(MPI_COMM_WORLD,0) - process 0)
.
.
.
SCR v1.1-8 ERROR: rank 3 on machine1: SCR_CNTL_BASE cannot be set in the
environment or user configuration file, ignoring setting
SCR v1.1-8 ERROR: rank 1 on machine1: SCR_CNTL_BASE cannot be set in the
environment or user configuration file, ignoring setting
.
.
.
slurmd[machine1]: ***STEP 195.1 KILLED AT 2013-12-24T10:11:23 WITH SIGNAL
9***
srun: Job step aborted: Waiting upto 2 seconds for job step to finish

slurmd[machine1]: *** STEP 195.1 KILLED AT 2013-12-24T10:11:23 WITH SIGNAL
9****
slurmd[machine2]: *** STEP 195.1 KILLED AT 2013-12-24T10:11:23 WITH SIGNAL
9****
.
.
.

There was actually a lot of output but i've just printed only one version
of each of the message types in the output. Also, how exactly must SCR be
made to work with BLCR when SLURM's srun is being used as the launcher ?
Can srun be used at all ? Should i use scr_srun (SCRs wrapper over srun) ?
Or am i restricted to using mpirun_rsh ?

About my SCR environment variables, I have set SCR_CNTL_BASE, SCR_RUNS,
SCR_CACHE_BASE, SCR_PREFIX and SCR_FLUSH.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131224/78ac304e/attachment.html>


More information about the mvapich-discuss mailing list