[mvapich-discuss] mpirun_rsh error

Malek Musleh malek.musleh at gmail.com
Wed Apr 24 16:04:24 EDT 2013


Yep, it worked. I Used the ENV variable MPIEXEC_PORT_RANGE=min:max

and then opened up those ports in the iptables config.

[root at hp4 mpi]# /root/mvapich2/bin/mpiexec.hydra -f mpi_hosts -iface ib0 -n
2 ./helloworld
Hello world from processor hp4, rank 1 out of 2 processors
Hello world from processor gpu6 rank 0 out of 2 processors

Thanks for the help,

Malek


On Wed, Apr 24, 2013 at 3:11 PM, Jonathan Perkins <
perkinjo at cse.ohio-state.edu> wrote:

> On Wed, Apr 24, 2013 at 02:48:00PM -0400, Malek Musleh wrote:
> > I disabled iptables on the nodes I was trying to run, and it worked, so
> > firewalls was the issue. That being said, I need to find a long term
> > solution to be able to have the firewalls active and run mpi experiments
> > across the inifiband ports. Would it be as simple as hacking the source
> > code to have MVAPICH use a fixed/hard-coded port number instead of
> > performing a system call?
>
> I think it'll be easier to setup iptables to allow connections between
> your compute nodes (for instance allow connections between
> 192.168.1.1/16) than to try to modify MVAPICH2 to use a set range of
> ports for incoming connections.
>
> Setting the ports to a restricted range for mpirun_rsh and mpispawn is
> possible but will require logic to find a new port to use if the
> "standard" one is in use already.
>
> It looks like you can also try using mpiexec (hydra) with the
> MPIEXEC_PORT_RANGE variable if you want to restrict the ports used by
> the launcher.
>
>
> http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Environment_Settings
>
> Please let me know if this works.
>
> >
> > Malek
> >
> >
> > On Wed, Apr 24, 2013 at 2:20 PM, Jonathan Perkins <
> > perkinjo at cse.ohio-state.edu> wrote:
> >
> > > On Wed, Apr 24, 2013 at 02:13:45PM -0400, Malek Musleh wrote:
> > > > Yes, there are active firewalls on the network, I forgot about that
> > > > potentially being an issue. I don't think it's permissble for me to
> > > disable
> > > > the firewalls completely, which ports can AVIMPICH be configured to
> use?
> > >
> > > MVAPICH2 doesn't use a predetermined port but one that is given to it
> by
> > > the OS when requested.  You don't need to disable the firewall but can
> > > you allow connections between machines that are on your network?
> > >
> > > >
> > > > The missing configure file was when I did an svn checkout, the branch
> > > > tarball (where the install worked) is the one I am using.
> > >
> > > Thanks.  There are also tarballs for the trunk branch however that
> won't
> > > resolve the issue that you're facing with the firewall setup.
> > >
> > > >
> > > > Malek
> > > >
> > > >
> > > > On Wed, Apr 24, 2013 at 2:10 PM, Jonathan Perkins <
> > > > perkinjo at cse.ohio-state.edu> wrote:
> > > >
> > > > > On Wed, Apr 24, 2013 at 01:33:29PM -0400, Malek Musleh wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I am encountering a problem when running mpirun_rsh across any
> > > external
> > > > > > node (any node besides the host machine from where it is
> launched).
> > > > > >
> > > > > > This is the command line I used:
> > > > > >
> > > > > > mpirun_rsh -np 1 10.2.4.4 ./helloworld
> > > > > >
> > > > > > (where the ipaddress is not the ip address of the current host).
> I am
> > > > > able
> > > > > > to ssh directly (without password) to the machine, so I am not
> sure
> > > why
> > > > > > connectivity is an issue.
> > > > > >
> > > > > > I get the following error:
> > > > > >
> > > > > > [gpu6.east.isi.edu:mpirun_rsh][mpispawn_checkin] connect()
> failed:
> > > > >  (113)
> > > > > > [gpu6.east.isi.edu:mpirun_rsh][wfe_thread] Internal error:
> > > transition
> > > > > failed
> > > > >
> > > > > Is there an active firewall on the machines?  It looks like the
> connect
> > > > > call is failing when mpirun_rsh is trying to respond back to the
> remote
> > > > > node that just checked in.
> > > > >
> > > > > >
> > > > > > Likewise, when I run the command on the node B to issue onto
> node A,
> > > the
> > > > > > same error occurs. Both machines have mvapich installed, and
> paths
> > > are
> > > > > set
> > > > > > up as well.
> > > > > >
> > > > > > The revision I am using is: mvapich2-1.8-r5827
> > > > > >
> > > > > > This is not the latest, but when I tried the latest, it didn't
> have a
> > > > > > ./configure file, so I opted to try a branch version instead
> hoping
> > > it
> > > > > was
> > > > > > more stable.
> > > > >
> > > > > Did you download a tarball or did you use svn to grab the latest
> code?
> > > > > If you downloaded a tarball missing configure please let us know
> which
> > > > > one is broken.  Everything looks fine with the tarballs for rc1 as
> well
> > > > > as the nightly tarballs for 1.8 and trunk.
> > > > >
> > > > > >
> > > > > > Any ideas, google search wasn't quite helpful.
> > > > > >
> > > > > > Malek
> > > > >
> > > > > > _______________________________________________
> > > > > > mvapich-discuss mailing list
> > > > > > mvapich-discuss at cse.ohio-state.edu
> > > > > > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > > > >
> > > > >
> > > > > --
> > > > > Jonathan Perkins
> > > > > http://www.cse.ohio-state.edu/~perkinjo
> > > > >
> > >
> > > --
> > > Jonathan Perkins
> > > http://www.cse.ohio-state.edu/~perkinjo
> > >
>
> --
> Jonathan Perkins
> http://www.cse.ohio-state.edu/~perkinjo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130424/a3fe87c2/attachment.html


More information about the mvapich-discuss mailing list