[mvapich-discuss] mpirun_rsh error

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Apr 24 14:10:35 EDT 2013


On Wed, Apr 24, 2013 at 01:33:29PM -0400, Malek Musleh wrote:
> Hi,
> 
> I am encountering a problem when running mpirun_rsh across any external
> node (any node besides the host machine from where it is launched).
> 
> This is the command line I used:
> 
> mpirun_rsh -np 1 10.2.4.4 ./helloworld
> 
> (where the ipaddress is not the ip address of the current host). I am able
> to ssh directly (without password) to the machine, so I am not sure why
> connectivity is an issue.
> 
> I get the following error:
> 
> [gpu6.east.isi.edu:mpirun_rsh][mpispawn_checkin] connect() failed:  (113)
> [gpu6.east.isi.edu:mpirun_rsh][wfe_thread] Internal error: transition failed

Is there an active firewall on the machines?  It looks like the connect
call is failing when mpirun_rsh is trying to respond back to the remote
node that just checked in.

> 
> Likewise, when I run the command on the node B to issue onto node A, the
> same error occurs. Both machines have mvapich installed, and paths are set
> up as well.
> 
> The revision I am using is: mvapich2-1.8-r5827
> 
> This is not the latest, but when I tried the latest, it didn't have a
> ./configure file, so I opted to try a branch version instead hoping it was
> more stable.

Did you download a tarball or did you use svn to grab the latest code?
If you downloaded a tarball missing configure please let us know which
one is broken.  Everything looks fine with the tarballs for rc1 as well
as the nightly tarballs for 1.8 and trunk.

> 
> Any ideas, google search wasn't quite helpful.
> 
> Malek

> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo


More information about the mvapich-discuss mailing list