[mvapich-discuss] mvapich2 1.4rc1 blcr and mpirun_rsh

Alex Ninaber alex.ninaber at clustervision.com
Thu Jul 9 03:19:32 EDT 2009


Dear list,

One of the new features in 1.4rc1 is listed as:

"o (NEW) Scalable Checkpoint-restart with mpirun_rsh framework "

Assuming this means that mpd is no longer needed, we get the following 
error:

mpirun_rsh -ssh -np 2 -hostfile ./nodes ./IMB-MPI1
[Rank 0][cr.c: line 186][Rank 0][cr.c: line 186]connect 0 failed
connect 1 failed
MPI process (rank: 1) terminated unexpectedly on quad01
Exit code -5 signaled from quad01
MPI process (rank: 0) terminated unexpectedly on quad01
Terminated

In .bashrc:
export MV2_CKPT_FILE=/home/user/chkpoint

Config:

./configure --prefix=/cvos/shared/apps/ofed/1.4/mpi/gcc/mvapich2-1.4rc1/ 
--with-rdma=gen2 --enable-blcr \
--disable-romio --disable-rdma-cm 
--with-ib-libpath=/cvos/shared/apps/ofed/1.4/lib64 \
--with-blcr-libpath=/cvos/shared/apps/blcr/0.8.2/lib/ 
--with-blcr-include=/cvos/shared/apps/blcr/0.8.2/include/ \
--with-ib-include=/cvos/shared/apps/ofed/1.4/include/ 
--enable-header-caching


Are we missing something? Both mpirun_rsh and mpd mechanisms are 
available, or should we choose one during install?

Thanks & regards,

Alex


(please cc alex.ninaber at clustervision.com)





More information about the mvapich-discuss mailing list