[mvapich-discuss] Process Termination Detection with mpirun_rsh

Wed Aug 20 15:56:52 EDT 2008

Hi Tom and Fred,

Thanks for reporting this issue and also sending us the follow-up
comments. We had started some internal discussions today morning here and
were suspecting the role of rsh vs. ssh. This seems to be the case here.
We will look for a solution which can solve this rsh-related problem. We
will get back to you in a few days.

Thanks,

DK

On Wed, 20 Aug 2008, Tom Crockett wrote:

> Tom Crockett wrote:
> > Following abnormal process termination on the remote node, there will be
> > only one active rsh process and one defunct rsh process, confirming that
> > the remote processes have cleaned up and exited.  So it seems that
> > mpirun_rsh is not responding properly to the death of a child process.
>
> I've been poking around in the source code for mpirun_rsh and mpispawn,
> and I think I've figured out what the trouble is -- I'm just not sure
> what to do about it.  mpispawn is correctly noticing that one of its
> children has terminated abnormally, kills off all of its other children,
> and exits with a non-zero return code. So far, so good.
>
> Unfortunately, rsh (unlike ssh) does not propagate this return code back
> as its own exit status, and instead exits with a return code of 0.
> mpirun_rsh incorrectly interprets this to mean that the remote processes
> have terminated normally.  Instead of jumping into its cleanup procedure
> to kill off the remaining processes in the job, it just sits around
> waiting for its other children to exit, which they will never do without
> outside intervention.
>
> One potential workaround would be to use ssh instead of rsh, but we much
> prefer to use rsh for spawning remote processes in our clusters.  There
> are two main reasons for this:  (1) rsh is simpler, faster, easier to
> configure, and less susceptible to breaking when users customize their
> personal settings, and (2) as a rule we disallow ssh and rlogin access
> to our compute nodes so that users will have fewer pathways to
> circumvent the job scheduler.
>
> So what is really needed is either a custom version of rsh which mirrors
> the return status of its remote command, or else some other mechanism by
> which mpispawn can notify mpirun_rsh when something bad happens to one
> of its children.  I'm curious if the former already exists somewhere?
>
>
> > Interestingly, whether the master node detects the remote process
> > termination seems to depend on how the remote process dies.  If I hit
> > the remote process with a SIGTERM, mpirun_rsh seems to notice and things
> > get cleaned up after a minute or two.  If it terminates with something
> > else (e.g., a SIGSEGV), the job will sit there forever.
>
> I haven't dug deeply into this behavior yet, but I conjecture that the
> SIGTERM is being caught by the MPI processes and is being handled in
> MPI-land, whereas most other signals (such as SIGSEGV) are not being
> trapped at the application level.
>
> -Tom
>
> --
> Tom Crockett
>
> College of William and Mary               email:  twcroc at wm.edu
> IT/High Performance Computing Group       phone:  (757) 221-2762
> Savage House                              fax:    (757) 221-2023
> P.O. Box 8795
> Williamsburg, VA  23187-8795
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>