[mvapich-discuss] Differing IB interfaces

Jonathan Perkins perkinjo at cse.ohio-state.edu
Fri Jul 27 08:12:33 EDT 2007


Lundrigan, Adam wrote:
> We’re using MVAPICH2 with Infiniband on a 5-node Sun/Solaris cluster 
> (Sun Fire x4100/x4200), and are having a problem with consistency in the 
> naming of our ibd interfaces.  On the x4100 nodes, the IPoIB interface 
> is ibd0.  However, on the head node (x4200), the interface is ibd2.  
> We’ve tried everything short of wiping the machine and reinstalling the 
> OS to force one of the two HCAs to have an ibd0, but thus far we have 
> failed.  The only choices Solaris seems to use are ibd2, ibd3, ibd6 and 
> ibd7 (we have 2 cards w/ 4 ports in that node)
> 
>  
> 
> Long story short:  Is there a way to force each instance of mpd to 
> connect using a different DAPL_PROVIDER?  We’ve tried setting the 
> environment on each node separately to the proper value, but that 
> doesn’t seem to work.  When I compile MVAPICH2 with DAPL_PROVIDER set to 
> ibd0, everything works find if we restrict the ring to just the 4 nodes 
> which use ibd0 as their adapter.  However, when we set the DAPL_PROVIDER 
> variable to ibd2 for the head node (in both ~user/.profile and 
> /etc/profile), I get the following:

Hello Adam,
Your situation sounds similar to another problem that a user has had in 
the past.  They were trying to use IPoIB with a non-default interface.

The directions that I'm sending below should allow mpiexec to work with 
different interfaces on different machines.

Assume that you have a cluster setup as the following:

  hostname | eth addr    | IPoIB addr
----------+-------------+-------------
  compute1 | 192.168.0.1 | 192.168.1.1
  compute2 | 192.168.0.2 | 192.168.1.2
  compute3 | 192.168.0.3 | 192.168.1.3
  compute4 | 192.168.0.4 | 192.168.1.4


In this scenario, you will start up mpd like normal.  However, you will 
need to create a machine file for mpiexec that tells mpiexec to use a 
particular interface.

Example:
$ cat - > $(MPD_HOSTFILE)
compute1
compute2
compute3
compute4

$ mpdboot -n 4 -f $(MPD_HOSTFILE)

$ cat - > $(MACHINEFILE)
compute1 ifhn=192.168.1.1
compute2 ifhn=192.168.1.2
compute3 ifhn=192.168.1.3
compute4 ifhn=192.168.1.4

$ mpiexec -n $(NUM_PROCESSES) -f $(MACHINEFILE) $(MPI_APPLICATION)

The ifhn portion tells mpiexec to use the interface associated with that 
ip address for each machine.  Please let us know if this works for you.

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list