[mvapich-discuss] Differing IB interfaces
Jonathan Perkins
perkinjo at cse.ohio-state.edu
Fri Jul 27 08:12:33 EDT 2007
Lundrigan, Adam wrote:
> We’re using MVAPICH2 with Infiniband on a 5-node Sun/Solaris cluster
> (Sun Fire x4100/x4200), and are having a problem with consistency in the
> naming of our ibd interfaces. On the x4100 nodes, the IPoIB interface
> is ibd0. However, on the head node (x4200), the interface is ibd2.
> We’ve tried everything short of wiping the machine and reinstalling the
> OS to force one of the two HCAs to have an ibd0, but thus far we have
> failed. The only choices Solaris seems to use are ibd2, ibd3, ibd6 and
> ibd7 (we have 2 cards w/ 4 ports in that node)
>
>
>
> Long story short: Is there a way to force each instance of mpd to
> connect using a different DAPL_PROVIDER? We’ve tried setting the
> environment on each node separately to the proper value, but that
> doesn’t seem to work. When I compile MVAPICH2 with DAPL_PROVIDER set to
> ibd0, everything works find if we restrict the ring to just the 4 nodes
> which use ibd0 as their adapter. However, when we set the DAPL_PROVIDER
> variable to ibd2 for the head node (in both ~user/.profile and
> /etc/profile), I get the following:
Hello Adam,
Your situation sounds similar to another problem that a user has had in
the past. They were trying to use IPoIB with a non-default interface.
The directions that I'm sending below should allow mpiexec to work with
different interfaces on different machines.
Assume that you have a cluster setup as the following:
hostname | eth addr | IPoIB addr
----------+-------------+-------------
compute1 | 192.168.0.1 | 192.168.1.1
compute2 | 192.168.0.2 | 192.168.1.2
compute3 | 192.168.0.3 | 192.168.1.3
compute4 | 192.168.0.4 | 192.168.1.4
In this scenario, you will start up mpd like normal. However, you will
need to create a machine file for mpiexec that tells mpiexec to use a
particular interface.
Example:
$ cat - > $(MPD_HOSTFILE)
compute1
compute2
compute3
compute4
$ mpdboot -n 4 -f $(MPD_HOSTFILE)
$ cat - > $(MACHINEFILE)
compute1 ifhn=192.168.1.1
compute2 ifhn=192.168.1.2
compute3 ifhn=192.168.1.3
compute4 ifhn=192.168.1.4
$ mpiexec -n $(NUM_PROCESSES) -f $(MACHINEFILE) $(MPI_APPLICATION)
The ifhn portion tells mpiexec to use the interface associated with that
ip address for each machine. Please let us know if this works for you.
--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
More information about the mvapich-discuss
mailing list