[mvapich-discuss] Followup: mvapich2 issue regarding mpd timeout in mpiexec

Mon Jun 9 12:02:53 EDT 2008

David,

Thank you for this suggestion.  We have enhanced the MPD mpiexec.py  
so that the timeout is based on a multiplier (default=0.05) and the  
number of processes.  Further, this can be configured by setting the  
environment variable MV2_MPD_RECVTIMEOUT_MULTIPLIER.  This  
enhancement is now available in our MVAPICH2 svn trunk (r2668) and  
1.0 branch (r2669).

Brian

On May 29, 2008, at 11:06 PM, <David_Kewley at Dell.com>  
<David_Kewley at Dell.com> wrote:

> This is a followup to this thread:
>
> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2007-May/ 
> 000834
> .html
>
> between Greg Bauer and Qi Gao.
>
> We had the same problem that Greg saw -- failure of mpiexec, with the
> characteristic error message "no msg recvd from mpd when expecting ack
> of request".  It was resolved for us by setting recvTimeout in
> mpiexec.py to a higher value, just as Greg suggested and Qi concurred.
> The default value is 20; we chose 200 (we did not experiment with  
> values
> between these two, so lower may work in many cases).
>
> I think this change should be made permanent in MVAPICH2.  I do not
> think it will negatively impact anyone, because in the four cases  
> where
> this timeout is used, if the timeout expires mpiexec immediately makes
> an error exit anyway.  So the worst consequence is that mpiexec would
> take longer to fail (3 minutes longer if 200 is used instead of 20).
> The user who encounters this timeout has to fix the root cause of the
> timeout in order to get any work done, so they are not likely to
> encounter it repeatedly and thereby lose lots of runtime simply  
> because
> the timeout is large.  Is this analysis correct?
>
> Meanwhile, this change would clearly help at least some people with
> large clusters.  We see failure with the default recvTimeout  
> between 900
> and 1000 processes; larger recvTimeout allows us to scale to 3000
> processes and beyond.
>
> The default setting does not cause failure if I make a simple, direct
> call to mpiexec.  I only see it when I use mpirun.lsf to launch a  
> large
> job.  I think the failure in the LSF case is due to the longer time it
> presumably takes to launch LSF's TaskStarter for every process, etc.
> The time required seems to be O(#processes) in the LSF case.  (We have
> LSF 6.2, with a local custom wrapper script for TaskStarter).
>
> If you agree that this change to the value of recvTimeout is OK,  
> please
> implement this one-line change in MVAPICH2, and consider  
> contributing it
> upstream to MPICH2 as well.
>
> If you decline to make this change, at least it's now on the web that
> this change does fix the problem. :)
>
> Thanks,
> David
>
> David Kewley
> Dell Infrastructure Consulting Services
> Onsite Engineer at the Maui HPC Center
> Cell: 602-460-7617
> David_Kewley at Dell.com
>
> Dell Services: http://www.dell.com/services/
> How am I doing? Email my manager Russell_Kelly at Dell.com with any
> feedback.
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss