[mvapich-discuss] MVAPICH error "connect: Network is unreachable"

Jonathan Perkins perkinjo at cse.ohio-state.edu
Tue Mar 8 14:03:57 EST 2011


Hello Robert.  There may be some issue related to the version of
mpirun_rsh you're using which is now getting outdated.  Our latest
changes have gone into mvapich2.  Can you try downloading
mvapich2-1.6rc3 from our website
(http://mvapich.cse.ohio-state.edu/download/mvapich2/)?  Please try
running your program using this with mpirun_rsh and/or mpiexec.hydra
to see if you continue to experience this problem.

On Mon, Mar 7, 2011 at 9:44 PM, Robert Jacobi <rjacobi at email.arizona.edu> wrote:
> Hello,
>
> I have a very strange connection error that occurs ONLY when I try to launch
> a program with mvapich (mpirun_rsh) from my head node specifically on
> compute node01:
> ------------
> [18:10]:robert at salvator:~/TEST>mpirun_rsh -np 1 node01 cutiosmpi.mvapich
> connect: Network is unreachable
>
> Child exited abnormally!
> Killing remote processes...DONE
> [18:10]:robert at salvator:~/TEST>node01: Connection refused
> ------------
>
> To locate the problem I've tried the following (with the same program as
> before), which all worked fine:
> - launch the program with mvapich (mpirun_rsh) from the head node on any
> other compute node
> - launch the program with mvapich (mpirun_rsh) from compute node01on compute
> node01
> - launch the program with mvapich (mpirun_rsh) from compute node02on compute
> node01
> - launch the program with mvapich (mpirun_rsh) from compute node01on the
> head node
> - launch the program with openmpi (mpirun) from the head node on compute
> node01 (compiled with openmpi)
>
> I made sure that:
> i) the connection is set-up such that I can ssh passwordless into all
> compute nodes (including node01) from the head node (and all the compute
> nodes into each other)
> ii) the compute nodes are in the hosts file with the right name
> iii) mvapich is loaded:
> ------------
> [18:16]:robert at salvator:~>mpi-selector --query
> default:mvapich_intel-1.2.0
> level:user
> [18:16]:robert at salvator:~>which mpirun_rsh
> /usr/mpi/intel/mvapich-1.2.0/bin/mpirun_rsh
> ------------
> Rebooting both the head node and the compute node01 didn't change anything.
> I've also tried this with benchmark tools and got the same error (both were
> compiled with mvapich and intel compiler).
>
> We're running RHEL5.5, 2.6.18-194.el5 x86_64 and use InfiniBand (Mellanox
> switch). The MVAPICH library came with the Mellanox OFED firmware tools
> ("mft-2.6.2-10" downloaded from
> http://mellanox.com/content/pages.php?pg=management_tools&menu_section=34)
>
> Thank You in advance for your help and please let me know if you need
> further information!
> I couldn't find any help for this error in the user guides or online and I'm
> at a complete loss how to go about fixing it.
>
> Robert
>
> --
> Robert Jacobi
> Research Assistant
> University of Arizona
> Department of Aerospace & Mechanical Engineering
> 1130 N. Mountain Ave.
> Tucson, AZ, 85721-0119
>
> tel: +1 (520) 621 4369
> mail: rjacobi at email.arizona.edu
>
>
> The less time you spent on algebra in life, the more time you have to be a
> happy person. (Kerschen)
>
> Doubt is not a pleasant condition, but certainty is absurd. (Voltaire)
>
> All great truths begin as blasphemies. (Shaw)
>
> Denken ist etwas, das auf Schwierigkeiten folgt und dem das Handeln
> vorausgeht.(Brecht)
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list