[mvapich-discuss] MVAPICH error "connect: Network is unreachable"
Robert Jacobi
rjacobi at email.arizona.edu
Mon Mar 7 21:44:34 EST 2011
Hello,
I have a very strange connection error that occurs ONLY when I try to
launch a program with mvapich (mpirun_rsh) from my head node
specifically on compute node01:
------------
[18:10]:robert at salvator:~/TEST>mpirun_rsh -np 1 node01 cutiosmpi.mvapich
connect: Network is unreachable
Child exited abnormally!
Killing remote processes...DONE
[18:10]:robert at salvator:~/TEST>node01: Connection refused
------------
To locate the problem I've tried the following (with the same program as
before), which all worked fine:
- launch the program with mvapich (mpirun_rsh) from the head node on any
other compute node
- launch the program with mvapich (mpirun_rsh) from compute node01on
compute node01
- launch the program with mvapich (mpirun_rsh) from compute node02on
compute node01
- launch the program with mvapich (mpirun_rsh) from compute node01on the
head node
- launch the program with openmpi (mpirun) from the head node on compute
node01 (compiled with openmpi)
I made sure that:
i) the connection is set-up such that I can ssh passwordless into all
compute nodes (including node01) from the head node (and all the compute
nodes into each other)
ii) the compute nodes are in the hosts file with the right name
iii) mvapich is loaded:
------------
[18:16]:robert at salvator:~>mpi-selector --query
default:mvapich_intel-1.2.0
level:user
[18:16]:robert at salvator:~>which mpirun_rsh
/usr/mpi/intel/mvapich-1.2.0/bin/mpirun_rsh
------------
Rebooting both the head node and the compute node01 didn't change
anything. I've also tried this with benchmark tools and got the same
error (both were compiled with mvapich and intel compiler).
We're running RHEL5.5, 2.6.18-194.el5 x86_64 and use InfiniBand
(Mellanox switch). The MVAPICH library came with the Mellanox OFED
firmware tools ("mft-2.6.2-10" downloaded from
http://mellanox.com/content/pages.php?pg=management_tools&menu_section=34)
Thank You in advance for your help and please let me know if you need
further information!
I couldn't find any help for this error in the user guides or online and
I'm at a complete loss how to go about fixing it.
Robert
--
Robert Jacobi
Research Assistant
University of Arizona
Department of Aerospace & Mechanical Engineering
1130 N. Mountain Ave.
Tucson, AZ, 85721-0119
tel: +1 (520) 621 4369
mail: rjacobi at email.arizona.edu
The less time you spent on algebra in life, the more time you have to be a happy person. (Kerschen)
Doubt is not a pleasant condition, but certainty is absurd. (Voltaire)
All great truths begin as blasphemies. (Shaw)
Denken ist etwas, das auf Schwierigkeiten folgt und dem das Handeln vorausgeht.(Brecht)
More information about the mvapich-discuss
mailing list