[mvapich-discuss] checkpointing failure ...
biswajit at crlindia.com
biswajit at crlindia.com
Wed Jun 11 04:40:43 EDT 2008
While running HPL with checkpointing enabled MVAPICH2 1.0.2 the
progamme crushed giving
following errors:
1. 2: reregister dentry 0x642010, addr 0x2bc6578000 pagebase_low_p,
10121216 register_nbytes
[2] Abort: reregister fails
at line 1104 in file dreg.c
rank 2 in job 1 n163_32790 caused collective abort of all ranks
exit status of rank 2: killed by signal 9
[mpiexec_cr][/home/biswajit/mvapichBlcrInstall/mvapich2-1.0.2ckpt1/src/pm/mpd/mpiexec_cr.c:
line 196]abort: checkpoint failed
While restarting restart fails gving following errors:
cri_syscall(CR_OP_RSTRT_PROCS): Invalid argument
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080611/7d708726/attachment.html
More information about the mvapich-discuss
mailing list