[mvapich-discuss] Process Termination Detection with mpirun_rsh

Wed Aug 20 15:42:49 EDT 2008

Tom Crockett wrote:
> Following abnormal process termination on the remote node, there will be 
> only one active rsh process and one defunct rsh process, confirming that 
> the remote processes have cleaned up and exited.  So it seems that 
> mpirun_rsh is not responding properly to the death of a child process.

I've been poking around in the source code for mpirun_rsh and mpispawn, 
and I think I've figured out what the trouble is -- I'm just not sure 
what to do about it.  mpispawn is correctly noticing that one of its 
children has terminated abnormally, kills off all of its other children, 
and exits with a non-zero return code. So far, so good.

Unfortunately, rsh (unlike ssh) does not propagate this return code back 
as its own exit status, and instead exits with a return code of 0. 
mpirun_rsh incorrectly interprets this to mean that the remote processes 
have terminated normally.  Instead of jumping into its cleanup procedure 
to kill off the remaining processes in the job, it just sits around 
waiting for its other children to exit, which they will never do without 
outside intervention.

One potential workaround would be to use ssh instead of rsh, but we much 
prefer to use rsh for spawning remote processes in our clusters.  There 
are two main reasons for this:  (1) rsh is simpler, faster, easier to 
configure, and less susceptible to breaking when users customize their 
personal settings, and (2) as a rule we disallow ssh and rlogin access 
to our compute nodes so that users will have fewer pathways to 
circumvent the job scheduler.

So what is really needed is either a custom version of rsh which mirrors 
the return status of its remote command, or else some other mechanism by 
which mpispawn can notify mpirun_rsh when something bad happens to one 
of its children.  I'm curious if the former already exists somewhere?

> Interestingly, whether the master node detects the remote process 
> termination seems to depend on how the remote process dies.  If I hit 
> the remote process with a SIGTERM, mpirun_rsh seems to notice and things 
> get cleaned up after a minute or two.  If it terminates with something 
> else (e.g., a SIGSEGV), the job will sit there forever.

I haven't dug deeply into this behavior yet, but I conjecture that the 
SIGTERM is being caught by the MPI processes and is being handled in 
MPI-land, whereas most other signals (such as SIGSEGV) are not being 
trapped at the application level.

-Tom

-- 
Tom Crockett

College of William and Mary               email:  twcroc at wm.edu
IT/High Performance Computing Group       phone:  (757) 221-2762
Savage House                              fax:    (757) 221-2023
P.O. Box 8795
Williamsburg, VA  23187-8795