[mvapich-discuss] suggested minor feature
Mark Potts
potts at hpcapplications.com
Wed Sep 5 11:14:04 EDT 2007
Hi,
Despite the many issues I've raised about MVAPICH job cleanup and
timeouts (all resolved now it appears), I'd like to raise another
related issue -- a suggestion.
We've found that a job that correctly has all processes call
MPI_Finalize() at the end of their communications stages, can
not permit any processes to terminate if it is desired for even
a single thread to continue to work. That is, after MPI_Finalize()
is called and any processes correctly terminate there is only a
10 second window in which any remaining processes will be allowed
to run before mpirun_rsh kills the remaining children. This
presents a problem for codes that naturally complete the job's
task in serial mode or codes in which debugging of a process is
needed after MPI_Finalize().
The suggestion would be:
to provide the timeout period (currently 10 seconds) as a
VIADEV_* env variable, with default of 10, which users could
then modify when 10 seconds was too little time for a remaining
process. By the same token this env variable could be used
to trim the timeout period to a smaller value, when a user
deemed 10 seconds to not be agressive enough.
regards,
--
***********************************
>> Mark J. Potts, PhD
>>
>> HPC Applications Inc.
>> phone: 410-992-8360 Bus
>> 410-313-9318 Home
>> 443-418-4375 Cell
>> email: potts at hpcapplications.com
>> potts at excray.com
***********************************
More information about the mvapich-discuss
mailing list