[mvapich-discuss] mvapich2+slurm+blcr Oh, My!

David Brown dmlb2000 at gmail.com
Wed Jun 23 15:45:18 EDT 2010


So I'm not sure who to ask this question to and what setup I'm doing
wrong, but something isn't working right.

I built mvapich2 1.5rc2 this way:

./configure \
                --prefix=%{mpidir} \
                --mandir=%{mpidir}/man \
                --enable-error-checking=runtime \
                --enable-timing=none \
                --enable-g=mem,dbg,meminit \
                --enable-sharedlibs=gcc \
                --with-rdma=gen2 \
                --enable-romio \
                --with-file-system=lustre+nfs \
                --with-slurm=/usr \
                --with-pmi=slurm \
                --with-pm=no \
                --enable-threads=multiple \
                --with-thread-package=pthreads \
                --disable-mpe \
                --without-mpe \
                --disable-nmpi-as-mpi \
                --enable-f77 \
                --enable-f90 \
                --enable-cxx \
                --enable-blcr

I built slurm this way:

rpmbuild -ta --with blcr --with postgresql slurm-2.1.9.tar.bz2

I configured slurm with:

CheckpointType=checkpoint/blcr

And I built IOR and launched it this way:

$ srun --checkpoint-dir=/lustre -n 8 -N 4  ./IOR -i 4 -b 32g -T 600 -E
-k -e -t 1m -o /lustre/testFile
[Rank 1][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
[Rank 3][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
[Rank 0][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
[Rank 2][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
[Rank 5][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
[Rank 4][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
[Rank 7][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
[Rank 6][cr.c: line 781]MV2_CKPT_MPD_BASE_PORT is not set
srun: error: x11: tasks 2-3: Exited with exit code 255
srun: error: x10: tasks 0-1: Exited with exit code 255
srun: error: x12: tasks 4-5: Exited with exit code 255
srun: error: x13: tasks 6-7: Exited with exit code 255

So I'm confused, this is obviously not working. Do I need to use some
other mechanism to launch? am I missing configuration somewhere?

Thanks,
- David Brown


More information about the mvapich-discuss mailing list