[mvapich-discuss] Differing IB interfaces

Lundrigan, Adam LundriganA at DFO-MPO.GC.CA
Tue Jul 31 21:38:05 EDT 2007


Shaun,

Thanks for your detailed explanation.  I've applied it to our situation,
and it works perfectly, with only slight modification to account for the
hostname of each machine being different than the hostname of its IPoIB
interfaces:

:~/mpitest> cat hostlist.txt
CNOOFS02-IB ifhn=CNOOFS02-IB
CNOOFS03-IB ifhn=CNOOFS03-IB
CNOOFS04-IB ifhn=CNOOFS04-IB
CNOOFS05-IB ifhn=CNOOFS05-IB

:~/mpitest> cat allotment.txt
CNOOFS01-IB:4
CNOOFS02-IB:4
CNOOFS03-IB:4
CNOOFS04-IB:4
CNOOFS05-IB:4

:~/mpitest> cat run.sh
#!/bin/bash
# Create MPD ring & display its members
mpdboot -n 5 -f hostlist.txt --ifhn=CNOOFS01-IB       ######
MODIFICATION HERE: ADDED --ifhn flag
mpdtrace
# Execute 4 processes on CNOOFS01 using ibd2, and the rest spread
amongst the other 4 nodes using ibd0
mpiexec -machinefile allotment.txt -n 4 -env MV2_DAPL_PROVIDER ibd2
test.e : -n 16 -env MV2_DAPL_PROVIDER ibd0 test.e
# Kill the ring
mpdallexit


....And the end result:


:~/mpitest> ./run.sh
CNOOFS01
CNOOFS04
CNOOFS02
CNOOFS05
CNOOFS03

# OSU MPI Latency Test (Version 2.2)
# Size          Latency (us)
0               2.40
1               2.59
2               2.57
4               2.63
8               2.58
16              2.64
32              2.61
64              2.68
128             2.74
256             2.85
512             3.01
1024            3.74
2048            4.58
4096            6.40
8192            10.18
16384           19.19
32768           33.33
65536           60.87
131072          92.59
262144          163.16
524288          1172.81
1048576         934.62
2097152         2053.89
4194304         3544.28

...and again, with a simple parallel integration example:

:~/mpitest> ./run_mpi_int.sh
CNOOFS01
CNOOFS04
CNOOFS02
CNOOFS05
CNOOFS03

Processing with           6400000. increments on 20 processors.
Process  1 has the partial result of 0.0779753693123919067176075259340
Process  0 has the partial result of 0.0784590957278428619003918242925
Process  2 has the partial result of 0.0770108988156797263924246976785
Process  4 has the partial result of 0.0736664379901417615270631245039
Process  6 has the partial result of 0.0685080649764021332170926825711
Process 13 has the partial result of 0.0383663598342793149553742182434
Process  5 has the partial result of 0.0713070673744533589655247851624
Process  8 has the partial result of 0.0616627960377112982470748647756
Process  7 has the partial result of 0.0652866875765211740478832780354
Process  9 has the partial result of 0.0576587328563634149425354280538
Process 12 has the partial result of 0.0436231699791427840473545529676
Process  3 has the partial result of 0.0755716305190318182738451469049
Process 14 has the partial result of 0.0328730083229195613192530345259
The result = 0.9999999999999964472863211994991
Process 18 has the partial result of 0.0092289931379904224795218681265
Process 15 has the partial result of 0.0271769837838666261653486344585
Process 11 has the partial result of 0.0486110287749165481674396005474
Process 19 has the partial result of 0.0030826662668720443936931641105
Process 10 has the partial result of 0.0532991844134873826721587874999
Process 16 has the partial result of 0.0213134041025218254339357315530
Process 17 has the partial result of 0.0153184201974605593149503235395


WOOOOOO!  I can't believe how simple that was. We've tried a similar
approach before, but were unable to get it to work then.  We would
randomly get "Cannot open IA" errors....which could be attributed to the
fact we didn't use a machinefile when running mpiexec, so some of the
processes using ibd0 would end up on the node which uses ibd2, and vice
versa.  At least now we know why it didn't work :)

Thanks a million for your help,
--
Adam Lundrigan
Computer Systems Programmer
Biological & Physical Oceanography Section
Science, Oceans & Environment Branch
Department of Fisheries and Oceans Canada
Northwest Atlantic Fisheries Centre 
St. John's, NL    A1C 5X1

Tel: (709) 772-8136
Fax: (709) 772-8138
Cell: (709) 277-4575
Office:  G10-117J
Email: LundriganA at dfo-mpo.gc.ca








-----Original Message-----
From: Shaun Rowland [mailto:rowland at cse.ohio-state.edu] 
Sent: Friday, July 27, 2007 2:03 PM
To: Lundrigan, Adam
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] Differing IB interfaces

Lundrigan, Adam wrote:
> We're using MVAPICH2 with Infiniband on a 5-node Sun/Solaris cluster 
> (Sun Fire x4100/x4200), and are having a problem with consistency in
the 
> naming of our ibd interfaces.  On the x4100 nodes, the IPoIB interface

> is ibd0.  However, on the head node (x4200), the interface is ibd2.  
> We've tried everything short of wiping the machine and reinstalling
the 
> OS to force one of the two HCAs to have an ibd0, but thus far we have 
> failed.  The only choices Solaris seems to use are ibd2, ibd3, ibd6
and 
> ibd7 (we have 2 cards w/ 4 ports in that node)

If I understand this correctly, the head node is ibd2 and the rest are
ibd0? In this case, you can use the machinefile argument to mpiexec to
setup where each process number is going to run. Once you do that, you
can use the mpiexec facility for passing specific environment variables
to each "group" of processes. For example, I have 5 hosts:

[rowland at s8 mvapich2-0.9.8-install]$ cat hosts
s8
s9
s10
s11
s12

Let's assume that s8 is your head node.  I start mpdboot on all hosts:

[rowland at s8 mvapich2-0.9.8-install]$ bin/mpdboot -n 5 -f hosts
[rowland at s8 mvapich2-0.9.8-install]$ bin/mpdtrace
s8
s12
s11
s10
s9

I've decided that I want to run 10 processes (2 on each node). So I
create the following machinefile file:

[rowland at s8 mvapich2-0.9.8-install]$ cat machines
s8:2
s9:2
s10:2
s11:2
s12:2

This specifies exactly how many processes run on each host in order. I
can use this with mpiexec in the following way:

[rowland at s8 mvapich2-0.9.8-install]$ bin/mpiexec -machinefile machines 
-n 2 -env MV2_DAPL_PROVIDER a ./test.sh : -n 8 -env MV2_DAPL_PROVIDER b 
./test.sh
MV2_DAPL_PROVIDER = |a| [s8]
MV2_DAPL_PROVIDER = |b| [s12]
MV2_DAPL_PROVIDER = |b| [s11]
MV2_DAPL_PROVIDER = |b| [s11]
MV2_DAPL_PROVIDER = |b| [s12]
MV2_DAPL_PROVIDER = |b| [s10]
MV2_DAPL_PROVIDER = |a| [s8]
MV2_DAPL_PROVIDER = |b| [s10]
MV2_DAPL_PROVIDER = |b| [s9]
MV2_DAPL_PROVIDER = |b| [s9]

I told mpiexec to use that machinefile to figure out the ordering, then
I said:

- for the first 2 processes, set MV2_DAPL_PROVIDER to "a"
- for the next 8 processes, set MV2_DAPL_PROVIDER to "b"

For this to work, you have to have the machinefile setup correctly (the 
:2 means the number of processes to run on a host, you can leave it out 
if just using one).  You can't run more processes than the machinefile 
specifies.

You might be able to try a simple case like this just to see if it works

(1 process per node):

[rowland at s8 mvapich2-0.9.8-install]$ cat machines
s8
s9
s10
s11
s12

[rowland at s8 mvapich2-0.9.8-install]$ bin/mpiexec -machinefile machines 
-n 1 -env MV2_DAPL_PROVIDER ibd2 ./test.sh : -n 4 -env MV2_DAPL_PROVIDER

ibd0 ./test.sh
MV2_DAPL_PROVIDER = |ibd0| [s12]
MV2_DAPL_PROVIDER = |ibd2| [s8]
MV2_DAPL_PROVIDER = |ibd0| [s11]
MV2_DAPL_PROVIDER = |ibd0| [s10]
MV2_DAPL_PROVIDER = |ibd0| [s9]

Of course, using your own test program.  The login shell startup files 
are not useful here.  The mpiexec process manages passing the 
environment variables to processes it launches.  By default, it passes 
all environment variables in the current environment when started.

-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/



More information about the mvapich-discuss mailing list