[mvapich-discuss] problem on mvapich2-1.8.1 checkpoint/restart with BLCR

Raghunath rajachan at cse.ohio-state.edu
Fri Feb 15 04:51:07 EST 2013


Suja,

There is a known bug in the 1.9-alpha2 version that causes the
configure script to look for FUSE when Checkpoint-Restart support is
enabled, even after adding the "--disable-ckpt-agregation"  flag. The
fix for this will be available as part of the next release.

Meanwhile, you can work around this bug by using the alternative
"--without-fuse" flag in place of "--disable-ckpt-agregation".
--
Raghu


On Fri, Feb 15, 2013 at 4:45 AM, Suja Ramachandran <sujaram at igcar.gov.in> wrote:
> Hi,
>
> I have trouble  building mvapich2-1.9a2. I am configuring it  with
>
> ./configure --with-device=ch3:mrail --with-rdma=gen2
> --disable-ckpt-aggregation  --disable-rdma-cm --enable-ckpt
> --with-blcr=/usr/local --enable-g=all --enable-error-messages=all
> --enable-shared --with-file-system=nfs --enable-xrc
> --prefix=/share/apps/mvapich2-1.9a2
>
> It's giving errors:
> configure: checking checkpoint aggregation components
> checking for library containing fuse_new... no
> configure: error: fuse library not found
>
> I don't have FUSE library installed.. Why is it checking for checkpoint
> aggregation components even after giving the option
> --disable-ckpt-aggregation ?
>
> thanks and regards,
> suja
>
>
> On Friday 15 February 2013 01:51 PM, Raghunath wrote:
>>
>> Suja,
>>
>>>   /share/apps/mvapich2-1.8.1/bin/mpirun_rsh -np 8 -hostfile ./hostfile
>>> MV2_IBA_HCA=mlx4_0  MV2_USE_RDMAOE=1 MV2_DEFAULT_PORT=1
>>> MV2_CKPT_FILE=~rpmaps/checkpoint/scripts/mvapichckpt
>>> MV2_DEBUG_SHOW_BACKTRACE=1 ./vector
>>>
>>> Also, I have tried it by avoiding the options MV2_IBA_HCA=mlx4_0
>>> MV2_USE_RDMAOE=1 MV2_DEFAULT_PORT=1.
>>>
>>> The environment variables set are
>>> export PATH=/share/apps/mvapich2-1.8.1/bin/:$PATH
>>> export
>>>
>>> LD_LIBRARY_PATH=/share/apps/mvapich2-1.8.1/lib:/usr/local/lib:$LD_LIBRARY_PATH
>>
>> Thanks for sending this information. I see nothing strange here.
>>
>>> I have not yet installed the fault tolerance backplane(FTB). Is that
>>> mandatory for checkpoint/restart?
>>
>> No, FTB is not mandatory to get Checkpoint/Restart working with
>> MVAPICH. The only external library that is "mandatory" for the CR
>> mechanism to work is BLCR.
>>
>>> I will also try with mvapich2-1.9
>>>
>>> (FYI,'vector' is  the executable of the vector addition program given
>>> here:
>>> http://www.cs.umanitoba.ca/~comp4510/examplesDIR/vsum.c )
>>
>> Thanks for this pointer. I was able to checkpoint this sample
>> application successfully as well (again, I compiled MVAPICH using the
>> same config flags that you have used).
>>
>> Do let us know if you continue facing issues even after upgrading to
>> MVAPICH2-1.9
>>
>> --
>> Raghu
>>
>


More information about the mvapich-discuss mailing list