[mvapich-discuss] Cleanup of killed jobs with torque+mvapich2

Sourav Chakraborty chakraborty.52 at buckeyemail.osu.edu
Mon Jul 30 21:44:30 EDT 2018


Hi Noam,

1. mpirun_rsh is the recommended launcher since it is not dependent on any
particular queuing system and provides the best performance.

2. We have seen in the past that killing a job through qdel does not always
clean up the processes on all nodes. Many HPC clusters utilize epilogue
scripts to ensure that all processes from that job are cleaned up. If that
is not feasible, you can try using mpiexec which has better integration
with torque.

You need to configure MVAPICH2 with the following parameters:

--with-pbs=/opt/torque --with-pbs-lib=/opt/torque/lib64
--with-pbs-include=/opt/torque/include

Replace /opt/torque with the correct installation path for your system.
Please refer to the userguide for some more details about using mpiexec:
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3-userguide.html#x1-370005.2.2

Thanks,
Sourav


On Mon, Jul 30, 2018 at 7:28 PM Noam Bernstein <noam.bernstein at nrl.navy.mil>
wrote:

> Hi - I’ve been starting to do some tests with mvapich2 (on CentOS 6 and
> the CentOS OFED packages for IB), and I’ve been having a hard time figuring
> out two things:
>
> 1. Is mpirun_rsh still the canonical way of starting jobs?  Does that
> depend on the queuing system?
> 2. What you I expect in terms of cleanup of torque/OpenPBS jobs killed
> with qdel?  My empirical observation is that processes on the head node are
> all killed, but on other nodes they continue to run, which is not really
> acceptable for us.
>
>
> Does anyone have any useful suggestions on either of these?
>
> thanks,
> Noam
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180730/11744921/attachment.html>


More information about the mvapich-discuss mailing list