[mvapich-discuss] cr_restart does not work when you change the
hosts file
Александр Твеленев
santvel at mail.ru
Wed Jul 20 04:30:28 EDT 2011
Hello group,
I run my application with the parameters
"/home/santvel/BLCR_MVAPICH2/MVAPICH2-1.7/bin/mpirun_rsh -np 2 --hostfile /home/santvel/BLCR_MVAPICH2/MVAPICH2-1.7/bin/hosts MV2_CKPT_FILE=/home/santvel/TEST2/hand/temp MV2_CKPT_MAX_SAVE_CKPTS=10 /home/santvel/TEST2/TEST"
hosts file content
>opt02
>opt02
create a checkpoint using the command: "mv2_checkpoint 4784"
After a failure, I decided to run the node opt06. To do this, change the hosts file.
hosts file content
>opt06
>opt06
Run cr_restart and got an error message:
"
[santvel at opt06 bin]$ cr_restart context.4784
Restarting...
[opt06:mpispawn_0][child_handler] MPI process (rank: 1, pid: 5082) terminated with signal 11 -> abort job
[opt06:mpirun_rsh][wait_for_mpispawn] mpispawn_0 from node opt06 aborted: MPI process error (1)
"
how to fix this error and restore the program after a crash?
Kind regards, Alexandr.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20110720/8d34fd21/attachment-0001.html
More information about the mvapich-discuss
mailing list