[mvapich-discuss] Problem with slow start in mpirun_rsh using mvapich1.1

Jaidev Sridhar sridharj at cse.ohio-state.edu
Tue Dec 2 12:46:18 EST 2008


Hi Terrance,

This timeout means we failed to launch mpispawn on some node within a 
reasonable amount of time. This could be due to ssh other network 
issues. Some node isn't accepting ssh connections or has a large delay.

If you wan a larger timeout, you need to export MPIRUN_TIMEOUT (seconds) -
   $ export MPIRUN_TIMEOUT=11111
   $ mpirun_rsh ...

You can also try to use rsh with -rsh flag to mpirun_rsh.

-Jaidev

On Tuesday 02 December 2008 10:32 AM, Terrence.LIAO at total.com wrote:
> 
> Dear mvapich,
> 
> I have this slow start problem and do not know how to fix it.  I am 
> trying to run a very simple hello world and using this typical command:
> mpirun_rsh -hostfile host.list -np 27 ./mpi_hello.exe on our IB cluster, 
>  from time to time it will run but in most case it give me "Timeout 
> during client startup".   Any advice to fix this problem.  Is 
> VIA_CM_TIMEOUT a parameter I should tune for this?
> 
> Thank you very much.
> 
> -- Terrence
> --------------------------------------------------------
> Terrence Liao, Ph.D.
> Research Computer Scientist
> TOTAL E&P RESEARCH & TECHNOLOGY USA, LLC
> 1201 Louisiana, Suite 1800, Houston, TX 77002
> Tel: 713.647.3498  Fax: 713.647.3638
> Email: terrence.liao at total.com
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list