[mvapich-discuss] checkpointing failure ... (fwd)

wei huang huanwei at cse.ohio-state.edu
Fri Nov 23 10:29:58 EST 2007


Hi Biswajit,

> > While restarting a process after a check pointing ,
> >  only few processes(Not all processes ) are starting up .
> > What could be the reason ...???

We never see this problem on our platform. Could you please give us more
detailed information? Such as, detailed steps you do C/R. Also, is mpd
running on the node you restart? Is BLCR installed properly on all nodes?
Everytime (assume you've tried multiple times) do the started processes
randomly distributed across all nodes, or just on few specific ones?

> >  My second problem is :
> >      While compiling a programme with application initiated synchronous chec=
> > kpointing
> >  (using   MVAPICH2_Sync_Checkpoint() ) getting following error messages .
> >    : undefined reference to `MVAPICH2_Sync_Checkpoint'
> >
> >    Is there any header file I need to include or link with any library ...??

Currently it is protected by a CFLAG SYNC_CKPT. Please put -DSYNC_CKPT in
the CFLAGS part of make.mvapich2.ofa and recompile mvapich2.

Thanks.

-- Wei




More information about the mvapich-discuss mailing list