[mvapich-discuss] Process Termination Detection with mpirun_rsh
Tom Crockett
twcroc at wm.edu
Wed Aug 20 15:42:49 EDT 2008
Tom Crockett wrote:
> Following abnormal process termination on the remote node, there will be
> only one active rsh process and one defunct rsh process, confirming that
> the remote processes have cleaned up and exited. So it seems that
> mpirun_rsh is not responding properly to the death of a child process.
I've been poking around in the source code for mpirun_rsh and mpispawn,
and I think I've figured out what the trouble is -- I'm just not sure
what to do about it. mpispawn is correctly noticing that one of its
children has terminated abnormally, kills off all of its other children,
and exits with a non-zero return code. So far, so good.
Unfortunately, rsh (unlike ssh) does not propagate this return code back
as its own exit status, and instead exits with a return code of 0.
mpirun_rsh incorrectly interprets this to mean that the remote processes
have terminated normally. Instead of jumping into its cleanup procedure
to kill off the remaining processes in the job, it just sits around
waiting for its other children to exit, which they will never do without
outside intervention.
One potential workaround would be to use ssh instead of rsh, but we much
prefer to use rsh for spawning remote processes in our clusters. There
are two main reasons for this: (1) rsh is simpler, faster, easier to
configure, and less susceptible to breaking when users customize their
personal settings, and (2) as a rule we disallow ssh and rlogin access
to our compute nodes so that users will have fewer pathways to
circumvent the job scheduler.
So what is really needed is either a custom version of rsh which mirrors
the return status of its remote command, or else some other mechanism by
which mpispawn can notify mpirun_rsh when something bad happens to one
of its children. I'm curious if the former already exists somewhere?
> Interestingly, whether the master node detects the remote process
> termination seems to depend on how the remote process dies. If I hit
> the remote process with a SIGTERM, mpirun_rsh seems to notice and things
> get cleaned up after a minute or two. If it terminates with something
> else (e.g., a SIGSEGV), the job will sit there forever.
I haven't dug deeply into this behavior yet, but I conjecture that the
SIGTERM is being caught by the MPI processes and is being handled in
MPI-land, whereas most other signals (such as SIGSEGV) are not being
trapped at the application level.
-Tom
--
Tom Crockett
College of William and Mary email: twcroc at wm.edu
IT/High Performance Computing Group phone: (757) 221-2762
Savage House fax: (757) 221-2023
P.O. Box 8795
Williamsburg, VA 23187-8795
More information about the mvapich-discuss
mailing list