[mvapich-discuss] Problems with migration in mvapich2-1.8rc1

Iván Cores González ivan.coresg at udc.es
Mon Apr 2 04:53:53 EDT 2012


Hello, 
I am testing the new mvapich2-1.8rc1 version in a small cluster with infiniband and I have problems trying the migration features. 

I install FTB, BLCR and MVAPICH without problems. The configuration of mvapich is: 

./configure --prefix=$HOME/mvapich/mvapich2-1.8rc1/build --with-device=ch3:psm --enable-shared --enable-ckpt --with-blcr=$HOME/blcr-0.8.4/build --enable-ckpt-migration --enable-checkpointing --with-hydra-ckpointlib=blcr --with-ftb=$HOME/ftb/ftb-0.6.2/build --disable-ckpt-aggregation 

Once I have executed the ftb daemos (ftb_database_server in the front-end and the ftb_agent in the computing nodes) and loaded the BLCR modules I only run the application with: 

mpirun_rsh -np 2 -hostfile mpd.hosts -sparehosts mpd.hostsMIG ./a.out 

where mpd.hosts is: 
compute-0-10 
compute-0-10 

and mpd.hostsMIG is: 
compute-0-9 
compute-0-9 

However, when I execute 
mv2_trigger compute-0-10 

or the other options to migrate ( pkill -SIGUSR2 mpispawn ) nothing happens. Only the FTB information is showed but the job continue working in the same node. 
I can not execute the " prelink --undo --all " command because I have not root privileges. Could this be the cause of the problem? 
Am I doing something wrong? 

Thanks, 
Iván Cores. 




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120402/4f17b852/attachment.html


More information about the mvapich-discuss mailing list