[mvapich-discuss] [MVAPICH2] Suspend / Resume

wei huang huanwei at cse.ohio-state.edu
Sat Feb 17 11:23:42 EST 2007


Hi Yann,

Thanks for using mvapich2.

May I have you clarify your question a bit more? Typically SIGSTOP is to
pause the program and SIGCONT is to restart that program. Is this what you
want to have?

If you want to suspend a MPI job and restart later. May I suggest you to
use the checkpoint/restart function of the latest mvapich2 release.
Detailed instructions can be found at:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html

Please note that you need BLCR installed on your systems.

Let us know if we undertand your question correctly.

Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501


On Fri, 16 Feb 2007, Yann K. wrote:

> Hello everybody,
>
> While looking at the mvapich2 gen2 code, I was looking for routines
> handling SIGSTOP and CONT, and couldn't find any. I work with an OFED
> stack and couldn't find anything on handling those signals as well at
> that level. What happens to MPI processes being served with an lsf, mpd,
> or slurmd SIGSTOP signal, especially if rdma memory is pinned and
> already registered on the board ?
>
> Thanks for ideas
>
> Yann K.
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list