[mvapich-discuss] BLCR+FTB+MVAPICH2:problems about migration between two nodes

Jian Lin lin.2180 at osu.edu
Fri Apr 3 10:16:39 EDT 2015


Hi, Qingze,

The problems should be related to the prelinking feature of your OS. 
It is required to disable prelinking to allow restart job on the other
nodes. Please refer to the following note of BLCR, modify your OS
setting, and try again.
<https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#prelink>

Thanks!


On Thu, 2 Apr 2015 10:48:52 +0800
hljgqz <15776869853 at 163.com> wrote:

>  To whom it may concern,
>         I have two problems when using BLCR+FTB+MVAPICH2 .I can
> restart mpi job in the original nodes, But I can't restart mpi job no
> different sets of nodes. 1)I can't restart the mpi job on a different
> node. I do as the userguide said. I have two nodes named node1 and
> node3 . Firstly, I run " mpirun_rsh -np 2 -hostfile hosts ./proc  "on
> node3, hosts file is consist of node1:4 Then I run cr_checkpoint -p
> <pid> .AND I can restart it using the original hosts file. BUT if I
> change hosts( the hostfile ) to node3:4 as the userguide said and
> restart the job but it doesn't work ,and report error :
> [node3:mpispawn_0][child_handler] MPI process (rank: 1, pid: 5174)
> terminated with signal 4 -> abort job
> [node3:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node
> node3 aborted: MPI process error (1) 2)I tried mpirun_rsh -np 4
> -hostfile ./hosts -sparehosts ./spare_hosts ./prog , BUT it doesn't
> work , and give back the messages : [root at node3 node0]# mpirun_rsh
> -np 4 -hostfile ./hosts -sparehosts ./spare_hosts ./hello [root at node3
> node0]# [FTB_WARNING][ftb_agent.c: line 46]FTBM_Wait failed 1 my
> configuration: [root at node3 node0]# mpiname -a MVAPICH2 2.1rc2 Thu Mar
> 12 20:00:00 EDT 2014 ch3:mrail Compilation CC: gcc    -DNDEBUG
> -DNVALGRIND -O2 CXX: g++   -DNDEBUG -DNVALGRIND -O2 F77: gfortran
> -L/home/node0/16/blcr/lib -L/lib   -O2 FC: gfortran   -O2
> Configuration
> --enable-ckpt-migration --with-blcr=/home/node0/16/blcr
> --without-hwloc
> 
> Hope for write back!
> sincerely yours
> Gong qingze
> 



-- 
Jian Lin
http://linjian.org



More information about the mvapich-discuss mailing list