[mvapich-discuss] Differing IB interfaces

Shaun Rowland rowland at cse.ohio-state.edu
Fri Jul 27 13:32:46 EDT 2007


Lundrigan, Adam wrote:
> We’re using MVAPICH2 with Infiniband on a 5-node Sun/Solaris cluster 
> (Sun Fire x4100/x4200), and are having a problem with consistency in the 
> naming of our ibd interfaces.  On the x4100 nodes, the IPoIB interface 
> is ibd0.  However, on the head node (x4200), the interface is ibd2.  
> We’ve tried everything short of wiping the machine and reinstalling the 
> OS to force one of the two HCAs to have an ibd0, but thus far we have 
> failed.  The only choices Solaris seems to use are ibd2, ibd3, ibd6 and 
> ibd7 (we have 2 cards w/ 4 ports in that node)

If I understand this correctly, the head node is ibd2 and the rest are
ibd0? In this case, you can use the machinefile argument to mpiexec to
setup where each process number is going to run. Once you do that, you
can use the mpiexec facility for passing specific environment variables
to each "group" of processes. For example, I have 5 hosts:

[rowland at s8 mvapich2-0.9.8-install]$ cat hosts
s8
s9
s10
s11
s12

Let's assume that s8 is your head node.  I start mpdboot on all hosts:

[rowland at s8 mvapich2-0.9.8-install]$ bin/mpdboot -n 5 -f hosts
[rowland at s8 mvapich2-0.9.8-install]$ bin/mpdtrace
s8
s12
s11
s10
s9

I've decided that I want to run 10 processes (2 on each node). So I
create the following machinefile file:

[rowland at s8 mvapich2-0.9.8-install]$ cat machines
s8:2
s9:2
s10:2
s11:2
s12:2

This specifies exactly how many processes run on each host in order. I
can use this with mpiexec in the following way:

[rowland at s8 mvapich2-0.9.8-install]$ bin/mpiexec -machinefile machines 
-n 2 -env MV2_DAPL_PROVIDER a ./test.sh : -n 8 -env MV2_DAPL_PROVIDER b 
./test.sh
MV2_DAPL_PROVIDER = |a| [s8]
MV2_DAPL_PROVIDER = |b| [s12]
MV2_DAPL_PROVIDER = |b| [s11]
MV2_DAPL_PROVIDER = |b| [s11]
MV2_DAPL_PROVIDER = |b| [s12]
MV2_DAPL_PROVIDER = |b| [s10]
MV2_DAPL_PROVIDER = |a| [s8]
MV2_DAPL_PROVIDER = |b| [s10]
MV2_DAPL_PROVIDER = |b| [s9]
MV2_DAPL_PROVIDER = |b| [s9]

I told mpiexec to use that machinefile to figure out the ordering, then
I said:

- for the first 2 processes, set MV2_DAPL_PROVIDER to "a"
- for the next 8 processes, set MV2_DAPL_PROVIDER to "b"

For this to work, you have to have the machinefile setup correctly (the 
:2 means the number of processes to run on a host, you can leave it out 
if just using one).  You can't run more processes than the machinefile 
specifies.

You might be able to try a simple case like this just to see if it works 
(1 process per node):

[rowland at s8 mvapich2-0.9.8-install]$ cat machines
s8
s9
s10
s11
s12

[rowland at s8 mvapich2-0.9.8-install]$ bin/mpiexec -machinefile machines 
-n 1 -env MV2_DAPL_PROVIDER ibd2 ./test.sh : -n 4 -env MV2_DAPL_PROVIDER 
ibd0 ./test.sh
MV2_DAPL_PROVIDER = |ibd0| [s12]
MV2_DAPL_PROVIDER = |ibd2| [s8]
MV2_DAPL_PROVIDER = |ibd0| [s11]
MV2_DAPL_PROVIDER = |ibd0| [s10]
MV2_DAPL_PROVIDER = |ibd0| [s9]

Of course, using your own test program.  The login shell startup files 
are not useful here.  The mpiexec process manages passing the 
environment variables to processes it launches.  By default, it passes 
all environment variables in the current environment when started.

-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


More information about the mvapich-discuss mailing list