[mvapich-discuss] mvapich2 1.4rc1 blcr and mpirun_rsh

Karthik Gopalakrishnan gopalakk at cse.ohio-state.edu
Thu Jul 9 07:46:46 EDT 2009


Hi Alex.

Your understanding is correct. You no longer need to use mpd for the
Checkpoint / Restart feature. You can use mpirun_rsh. You can also
switch between mpirun_rsh and mpd at runtime.

However, changes have been made in the MPI Library to support this
feature. Looking at the line number in cr.c reported in the error
message, it looks like IMB-MPI1 has been compiled with an older
version of MVAPICH2. Please recompile IMB with MVAPICH2-1.4 RC1 and
rerun IMB-MPI1.

Also note that mpirun_rsh treats MV2_CKPT_FILE as a parameter and does
not read it from the environment variable. You need to run IMB as
follows:
mpirun_rsh -ssh -np 2 -hostfile ./nodes
MV2_CKPT_FILE=/home/user/chkpoint ./IMB-MPI1

Please let us know if this helps.

Regards,
Karthik

On Thu, Jul 9, 2009 at 3:19 AM, Alex
Ninaber<alex.ninaber at clustervision.com> wrote:
>
> Dear list,
>
> One of the new features in 1.4rc1 is listed as:
>
> "o (NEW) Scalable Checkpoint-restart with mpirun_rsh framework "
>
> Assuming this means that mpd is no longer needed, we get the following
> error:
>
> mpirun_rsh -ssh -np 2 -hostfile ./nodes ./IMB-MPI1
> [Rank 0][cr.c: line 186][Rank 0][cr.c: line 186]connect 0 failed
> connect 1 failed
> MPI process (rank: 1) terminated unexpectedly on quad01
> Exit code -5 signaled from quad01
> MPI process (rank: 0) terminated unexpectedly on quad01
> Terminated
>
> In .bashrc:
> export MV2_CKPT_FILE=/home/user/chkpoint
>
> Config:
>
> ./configure --prefix=/cvos/shared/apps/ofed/1.4/mpi/gcc/mvapich2-1.4rc1/
> --with-rdma=gen2 --enable-blcr \
> --disable-romio --disable-rdma-cm
> --with-ib-libpath=/cvos/shared/apps/ofed/1.4/lib64 \
> --with-blcr-libpath=/cvos/shared/apps/blcr/0.8.2/lib/
> --with-blcr-include=/cvos/shared/apps/blcr/0.8.2/include/ \
> --with-ib-include=/cvos/shared/apps/ofed/1.4/include/
> --enable-header-caching
>
>
> Are we missing something? Both mpirun_rsh and mpd mechanisms are available,
> or should we choose one during install?
>
> Thanks & regards,
>
> Alex
>
>
> (please cc alex.ninaber at clustervision.com)
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>


More information about the mvapich-discuss mailing list