[mvapich-discuss] mvapich2 1.4rc1 blcr and mpirun_rsh
Alex Ninaber
alex.ninaber at clustervision.com
Thu Jul 9 03:19:32 EDT 2009
Dear list,
One of the new features in 1.4rc1 is listed as:
"o (NEW) Scalable Checkpoint-restart with mpirun_rsh framework "
Assuming this means that mpd is no longer needed, we get the following
error:
mpirun_rsh -ssh -np 2 -hostfile ./nodes ./IMB-MPI1
[Rank 0][cr.c: line 186][Rank 0][cr.c: line 186]connect 0 failed
connect 1 failed
MPI process (rank: 1) terminated unexpectedly on quad01
Exit code -5 signaled from quad01
MPI process (rank: 0) terminated unexpectedly on quad01
Terminated
In .bashrc:
export MV2_CKPT_FILE=/home/user/chkpoint
Config:
./configure --prefix=/cvos/shared/apps/ofed/1.4/mpi/gcc/mvapich2-1.4rc1/
--with-rdma=gen2 --enable-blcr \
--disable-romio --disable-rdma-cm
--with-ib-libpath=/cvos/shared/apps/ofed/1.4/lib64 \
--with-blcr-libpath=/cvos/shared/apps/blcr/0.8.2/lib/
--with-blcr-include=/cvos/shared/apps/blcr/0.8.2/include/ \
--with-ib-include=/cvos/shared/apps/ofed/1.4/include/
--enable-header-caching
Are we missing something? Both mpirun_rsh and mpd mechanisms are
available, or should we choose one during install?
Thanks & regards,
Alex
(please cc alex.ninaber at clustervision.com)
More information about the mvapich-discuss
mailing list