[mvapich-discuss] mvapich2 1.4rc1 blcr and mpirun_rsh

Alex Ninaber alex.ninaber at clustervision.com
Fri Jul 10 04:35:40 EDT 2009


Dear Kathik,

Indeed you're right; I was loading the wrong module,

Regards,

Alex


Karthik Gopalakrishnan wrote:
> Hi Alex.
>
> Your understanding is correct. You no longer need to use mpd for the
> Checkpoint / Restart feature. You can use mpirun_rsh. You can also
> switch between mpirun_rsh and mpd at runtime.
>
> However, changes have been made in the MPI Library to support this
> feature. Looking at the line number in cr.c reported in the error
> message, it looks like IMB-MPI1 has been compiled with an older
> version of MVAPICH2. Please recompile IMB with MVAPICH2-1.4 RC1 and
> rerun IMB-MPI1.
>
> Also note that mpirun_rsh treats MV2_CKPT_FILE as a parameter and does
> not read it from the environment variable. You need to run IMB as
> follows:
> mpirun_rsh -ssh -np 2 -hostfile ./nodes
> MV2_CKPT_FILE=/home/user/chkpoint ./IMB-MPI1
>
> Please let us know if this helps.
>
> Regards,
> Karthik
>
> On Thu, Jul 9, 2009 at 3:19 AM, Alex
> Ninaber<alex.ninaber at clustervision.com> wrote:
>   
>> Dear list,
>>
>> One of the new features in 1.4rc1 is listed as:
>>
>> "o (NEW) Scalable Checkpoint-restart with mpirun_rsh framework "
>>
>> Assuming this means that mpd is no longer needed, we get the following
>> error:
>>
>> mpirun_rsh -ssh -np 2 -hostfile ./nodes ./IMB-MPI1
>> [Rank 0][cr.c: line 186][Rank 0][cr.c: line 186]connect 0 failed
>> connect 1 failed
>> MPI process (rank: 1) terminated unexpectedly on quad01
>> Exit code -5 signaled from quad01
>> MPI process (rank: 0) terminated unexpectedly on quad01
>> Terminated
>>
>> In .bashrc:
>> export MV2_CKPT_FILE=/home/user/chkpoint
>>
>> Config:
>>
>> ./configure --prefix=/cvos/shared/apps/ofed/1.4/mpi/gcc/mvapich2-1.4rc1/
>> --with-rdma=gen2 --enable-blcr \
>> --disable-romio --disable-rdma-cm
>> --with-ib-libpath=/cvos/shared/apps/ofed/1.4/lib64 \
>> --with-blcr-libpath=/cvos/shared/apps/blcr/0.8.2/lib/
>> --with-blcr-include=/cvos/shared/apps/blcr/0.8.2/include/ \
>> --with-ib-include=/cvos/shared/apps/ofed/1.4/include/
>> --enable-header-caching
>>
>>
>> Are we missing something? Both mpirun_rsh and mpd mechanisms are available,
>> or should we choose one during install?
>>
>> Thanks & regards,
>>
>> Alex
>>
>>
>> (please cc alex.ninaber at clustervision.com)
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>     


-- 

------------------------------------------------------------------
Dr Alex Ninaber                          ClusterVision
Technical Manager                        tel NL: +31 20 407 7557
http://www.ClusterVision.com             tel UK: +44 870 080 1980
email:Alex.Ninaber at ClusterVision.com     tel Mob: +31 61 650 4127
support: support at ClusterVision.com       skype: AlexNinaber 



More information about the mvapich-discuss mailing list