[mvapich-discuss] mvapich2 issue regarding mpd timeout in mpiexec

Gregory Bauer gbauer at ncsa.uiuc.edu
Tue May 15 15:50:15 EDT 2007


When running at scale (2048 tasks and greater, with 8 tasks per node or 
ppn=8) we occasionally see the following from mpiexec:

mpiexec_abe1192 (mpiexec 411): no msg recvd from mpd when expecting ack 
of request

The reporting  node may change, so it is not tied to any node in particular.

The sequence we use is:
mpdboot
mpdtrace
mpiexec

The outpout from mpdtrace is fine. It is only when mpiexec is ready the 
start up the actual mpi tasks.

After looking at mpiexec.py and mpdlib.py I see that there is a 
parameter in mpiexec.py called
recvTimeout
that is set to 20.

If we set this to a larger value, will this reduce the likelihood of 
getting the 'ack' timeout?

This was with mvapich2-0.9.8-2007-05-03 but we are now at 
mvapich2-0.9.8p2 for testing, using ofed 1.1.

-Greg


More information about the mvapich-discuss mailing list