[mvapich-discuss] mpirun_rsh error

Malek Musleh malek.musleh at gmail.com
Wed Apr 24 14:48:00 EDT 2013


I disabled iptables on the nodes I was trying to run, and it worked, so
firewalls was the issue. That being said, I need to find a long term
solution to be able to have the firewalls active and run mpi experiments
across the inifiband ports. Would it be as simple as hacking the source
code to have MVAPICH use a fixed/hard-coded port number instead of
performing a system call?

Malek


On Wed, Apr 24, 2013 at 2:20 PM, Jonathan Perkins <
perkinjo at cse.ohio-state.edu> wrote:

> On Wed, Apr 24, 2013 at 02:13:45PM -0400, Malek Musleh wrote:
> > Yes, there are active firewalls on the network, I forgot about that
> > potentially being an issue. I don't think it's permissble for me to
> disable
> > the firewalls completely, which ports can AVIMPICH be configured to use?
>
> MVAPICH2 doesn't use a predetermined port but one that is given to it by
> the OS when requested.  You don't need to disable the firewall but can
> you allow connections between machines that are on your network?
>
> >
> > The missing configure file was when I did an svn checkout, the branch
> > tarball (where the install worked) is the one I am using.
>
> Thanks.  There are also tarballs for the trunk branch however that won't
> resolve the issue that you're facing with the firewall setup.
>
> >
> > Malek
> >
> >
> > On Wed, Apr 24, 2013 at 2:10 PM, Jonathan Perkins <
> > perkinjo at cse.ohio-state.edu> wrote:
> >
> > > On Wed, Apr 24, 2013 at 01:33:29PM -0400, Malek Musleh wrote:
> > > > Hi,
> > > >
> > > > I am encountering a problem when running mpirun_rsh across any
> external
> > > > node (any node besides the host machine from where it is launched).
> > > >
> > > > This is the command line I used:
> > > >
> > > > mpirun_rsh -np 1 10.2.4.4 ./helloworld
> > > >
> > > > (where the ipaddress is not the ip address of the current host). I am
> > > able
> > > > to ssh directly (without password) to the machine, so I am not sure
> why
> > > > connectivity is an issue.
> > > >
> > > > I get the following error:
> > > >
> > > > [gpu6.east.isi.edu:mpirun_rsh][mpispawn_checkin] connect() failed:
> > >  (113)
> > > > [gpu6.east.isi.edu:mpirun_rsh][wfe_thread] Internal error:
> transition
> > > failed
> > >
> > > Is there an active firewall on the machines?  It looks like the connect
> > > call is failing when mpirun_rsh is trying to respond back to the remote
> > > node that just checked in.
> > >
> > > >
> > > > Likewise, when I run the command on the node B to issue onto node A,
> the
> > > > same error occurs. Both machines have mvapich installed, and paths
> are
> > > set
> > > > up as well.
> > > >
> > > > The revision I am using is: mvapich2-1.8-r5827
> > > >
> > > > This is not the latest, but when I tried the latest, it didn't have a
> > > > ./configure file, so I opted to try a branch version instead hoping
> it
> > > was
> > > > more stable.
> > >
> > > Did you download a tarball or did you use svn to grab the latest code?
> > > If you downloaded a tarball missing configure please let us know which
> > > one is broken.  Everything looks fine with the tarballs for rc1 as well
> > > as the nightly tarballs for 1.8 and trunk.
> > >
> > > >
> > > > Any ideas, google search wasn't quite helpful.
> > > >
> > > > Malek
> > >
> > > > _______________________________________________
> > > > mvapich-discuss mailing list
> > > > mvapich-discuss at cse.ohio-state.edu
> > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > >
> > >
> > > --
> > > Jonathan Perkins
> > > http://www.cse.ohio-state.edu/~perkinjo
> > >
>
> --
> Jonathan Perkins
> http://www.cse.ohio-state.edu/~perkinjo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130424/dd6854a8/attachment.html


More information about the mvapich-discuss mailing list