[mvapich-discuss] Followup: mvapich2 issue regarding mpd timeout in mpiexec

David_Kewley at Dell.com David_Kewley at Dell.com
Thu May 29 23:06:43 EDT 2008


This is a followup to this thread:

http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2007-May/000834
.html

between Greg Bauer and Qi Gao.

We had the same problem that Greg saw -- failure of mpiexec, with the
characteristic error message "no msg recvd from mpd when expecting ack
of request".  It was resolved for us by setting recvTimeout in
mpiexec.py to a higher value, just as Greg suggested and Qi concurred.
The default value is 20; we chose 200 (we did not experiment with values
between these two, so lower may work in many cases).

I think this change should be made permanent in MVAPICH2.  I do not
think it will negatively impact anyone, because in the four cases where
this timeout is used, if the timeout expires mpiexec immediately makes
an error exit anyway.  So the worst consequence is that mpiexec would
take longer to fail (3 minutes longer if 200 is used instead of 20).
The user who encounters this timeout has to fix the root cause of the
timeout in order to get any work done, so they are not likely to
encounter it repeatedly and thereby lose lots of runtime simply because
the timeout is large.  Is this analysis correct?

Meanwhile, this change would clearly help at least some people with
large clusters.  We see failure with the default recvTimeout between 900
and 1000 processes; larger recvTimeout allows us to scale to 3000
processes and beyond.

The default setting does not cause failure if I make a simple, direct
call to mpiexec.  I only see it when I use mpirun.lsf to launch a large
job.  I think the failure in the LSF case is due to the longer time it
presumably takes to launch LSF's TaskStarter for every process, etc.
The time required seems to be O(#processes) in the LSF case.  (We have
LSF 6.2, with a local custom wrapper script for TaskStarter).

If you agree that this change to the value of recvTimeout is OK, please
implement this one-line change in MVAPICH2, and consider contributing it
upstream to MPICH2 as well.

If you decline to make this change, at least it's now on the web that
this change does fix the problem. :)

Thanks,
David

David Kewley
Dell Infrastructure Consulting Services
Onsite Engineer at the Maui HPC Center
Cell: 602-460-7617
David_Kewley at Dell.com

Dell Services: http://www.dell.com/services/
How am I doing? Email my manager Russell_Kelly at Dell.com with any
feedback.




More information about the mvapich-discuss mailing list