[mvapich-discuss] Cleanup of killed jobs with torque+mvapich2

Noam Bernstein noam.bernstein at nrl.navy.mil
Tue Jul 31 08:16:12 EDT 2018


> On Jul 30, 2018, at 9:44 PM, Sourav Chakraborty <chakraborty.52 at buckeyemail.osu.edu> wrote:
> 
> Hi Noam,
> 
> 1. mpirun_rsh is the recommended launcher since it is not dependent on any particular queuing system and provides the best performance.
> 
> 2. We have seen in the past that killing a job through qdel does not always clean up the processes on all nodes. Many HPC clusters utilize epilogue scripts to ensure that all processes from that job are cleaned up. If that is not feasible, you can try using mpiexec which has better integration with torque.

If you have any suggestions for example epilogue scripts, I would be happy to try that, but I’m not sure how to identify the mpirun_rsh-started processes otherwise.  I guess I could kill all jobs that don’t belong to root/etc.

> 
> You need to configure MVAPICH2 with the following parameters:
> 
> --with-pbs=/opt/torque --with-pbs-lib=/opt/torque/lib64 --with-pbs-include=/opt/torque/include
> 
> Replace /opt/torque with the correct installation path for your system. Please refer to the userguide for some more details about using mpiexec: http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3-userguide.html#x1-370005.2.2 <http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3-userguide.html#x1-370005.2.2>

Great, thanks.  That’s very helpful.

									Noam

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180731/7829d2f1/attachment-0001.html>


More information about the mvapich-discuss mailing list