[mvapich-discuss] suggested minor feature
Dhabaleswar Panda
panda at cse.ohio-state.edu
Wed Sep 5 13:25:57 EDT 2007
Hi Mark,
> Despite the many issues I've raised about MVAPICH job cleanup and
> timeouts (all resolved now it appears), I'd like to raise another
> related issue -- a suggestion.
Glad to know that the issues appear to be resolved now.
Thanks for the following suggestion. We will work on this enhancement
and will send you a patch soon.
> We've found that a job that correctly has all processes call
> MPI_Finalize() at the end of their communications stages, can
> not permit any processes to terminate if it is desired for even
> a single thread to continue to work. That is, after MPI_Finalize()
> is called and any processes correctly terminate there is only a
> 10 second window in which any remaining processes will be allowed
> to run before mpirun_rsh kills the remaining children. This
> presents a problem for codes that naturally complete the job's
> task in serial mode or codes in which debugging of a process is
> needed after MPI_Finalize().
>
> The suggestion would be:
> to provide the timeout period (currently 10 seconds) as a
> VIADEV_* env variable, with default of 10, which users could
> then modify when 10 seconds was too little time for a remaining
> process. By the same token this env variable could be used
> to trim the timeout period to a smaller value, when a user
> deemed 10 seconds to not be agressive enough.
> regards,
Best Regards,
DK
> ***********************************
> >> Mark J. Potts, PhD
> >>
> >> HPC Applications Inc.
> >> phone: 410-992-8360 Bus
> >> 410-313-9318 Home
> >> 443-418-4375 Cell
> >> email: potts at hpcapplications.com
> >> potts at excray.com
> ***********************************
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
More information about the mvapich-discuss
mailing list