[mvapich-discuss] Followup: mvapich2 issue regarding mpd timeout
in mpiexec
Brian Curtis
curtisbr at cse.ohio-state.edu
Mon Jun 9 12:02:53 EDT 2008
David,
Thank you for this suggestion. We have enhanced the MPD mpiexec.py
so that the timeout is based on a multiplier (default=0.05) and the
number of processes. Further, this can be configured by setting the
environment variable MV2_MPD_RECVTIMEOUT_MULTIPLIER. This
enhancement is now available in our MVAPICH2 svn trunk (r2668) and
1.0 branch (r2669).
Brian
On May 29, 2008, at 11:06 PM, <David_Kewley at Dell.com>
<David_Kewley at Dell.com> wrote:
> This is a followup to this thread:
>
> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2007-May/
> 000834
> .html
>
> between Greg Bauer and Qi Gao.
>
> We had the same problem that Greg saw -- failure of mpiexec, with the
> characteristic error message "no msg recvd from mpd when expecting ack
> of request". It was resolved for us by setting recvTimeout in
> mpiexec.py to a higher value, just as Greg suggested and Qi concurred.
> The default value is 20; we chose 200 (we did not experiment with
> values
> between these two, so lower may work in many cases).
>
> I think this change should be made permanent in MVAPICH2. I do not
> think it will negatively impact anyone, because in the four cases
> where
> this timeout is used, if the timeout expires mpiexec immediately makes
> an error exit anyway. So the worst consequence is that mpiexec would
> take longer to fail (3 minutes longer if 200 is used instead of 20).
> The user who encounters this timeout has to fix the root cause of the
> timeout in order to get any work done, so they are not likely to
> encounter it repeatedly and thereby lose lots of runtime simply
> because
> the timeout is large. Is this analysis correct?
>
> Meanwhile, this change would clearly help at least some people with
> large clusters. We see failure with the default recvTimeout
> between 900
> and 1000 processes; larger recvTimeout allows us to scale to 3000
> processes and beyond.
>
> The default setting does not cause failure if I make a simple, direct
> call to mpiexec. I only see it when I use mpirun.lsf to launch a
> large
> job. I think the failure in the LSF case is due to the longer time it
> presumably takes to launch LSF's TaskStarter for every process, etc.
> The time required seems to be O(#processes) in the LSF case. (We have
> LSF 6.2, with a local custom wrapper script for TaskStarter).
>
> If you agree that this change to the value of recvTimeout is OK,
> please
> implement this one-line change in MVAPICH2, and consider
> contributing it
> upstream to MPICH2 as well.
>
> If you decline to make this change, at least it's now on the web that
> this change does fix the problem. :)
>
> Thanks,
> David
>
> David Kewley
> Dell Infrastructure Consulting Services
> Onsite Engineer at the Maui HPC Center
> Cell: 602-460-7617
> David_Kewley at Dell.com
>
> Dell Services: http://www.dell.com/services/
> How am I doing? Email my manager Russell_Kelly at Dell.com with any
> feedback.
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list