[mvapich-discuss] mvapich2 issue regarding mpd timeout in mpiexec

Qi Gao gaoq at cse.ohio-state.edu
Wed May 16 18:20:49 EDT 2007


Hi Greg,

> After looking at mpiexec.py and mpdlib.py I see that there is a parameter
> in mpiexec.py called
> recvTimeout
> that is set to 20.
>
> If we set this to a larger value, will this reduce the likelihood of
> getting the 'ack' timeout?

Thanks for telling us the problem. Based on my understanding, increasing
this value would help, but I'm not absolutely sure. I think you can try
that. Also maybe mpich-discuss is a better place to ask mpd-related
question.

Thanks.
--Qi

----- Original Message ----- 
From: "Gregory Bauer" <gbauer at ncsa.uiuc.edu>
To: <mvapich-discuss at cse.ohio-state.edu>
Sent: Tuesday, May 15, 2007 3:50 PM
Subject: [mvapich-discuss] mvapich2 issue regarding mpd timeout in mpiexec


> When running at scale (2048 tasks and greater, with 8 tasks per node or
> ppn=8) we occasionally see the following from mpiexec:
>
> mpiexec_abe1192 (mpiexec 411): no msg recvd from mpd when expecting ack of
> request
>
> The reporting  node may change, so it is not tied to any node in
> particular.
>
> The sequence we use is:
> mpdboot
> mpdtrace
> mpiexec
>
> The outpout from mpdtrace is fine. It is only when mpiexec is ready the
> start up the actual mpi tasks.
>
> After looking at mpiexec.py and mpdlib.py I see that there is a parameter
> in mpiexec.py called
> recvTimeout
> that is set to 20.
>
> If we set this to a larger value, will this reduce the likelihood of
> getting the 'ack' timeout?
>
> This was with mvapich2-0.9.8-2007-05-03 but we are now at mvapich2-0.9.8p2
> for testing, using ofed 1.1.
>
> -Greg
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list