[mvapich-discuss] checkpointing failure ...

biswajit at crlindia.com biswajit at crlindia.com
Wed Jun 11 04:40:43 EDT 2008


While running HPL  with  checkpointing  enabled MVAPICH2 1.0.2  the 
progamme crushed giving 
 following errors:

  1. 2: reregister dentry 0x642010, addr 0x2bc6578000 pagebase_low_p, 
10121216 register_nbytes
  [2] Abort: reregister fails
  at line 1104 in file dreg.c
  rank 2 in job 1  n163_32790   caused collective abort of all ranks
   exit status of rank 2: killed by signal 9 
 
[mpiexec_cr][/home/biswajit/mvapichBlcrInstall/mvapich2-1.0.2ckpt1/src/pm/mpd/mpiexec_cr.c: 
line 196]abort: checkpoint failed




While restarting  restart fails gving following errors:

     cri_syscall(CR_OP_RSTRT_PROCS): Invalid argument
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080611/7d708726/attachment.html


More information about the mvapich-discuss mailing list