[mvapich-discuss] OpenFabrics

Kevin Ball kball at pathscale.com
Thu Aug 31 17:35:59 EDT 2006


Hi Sayantan,

On Thu, 2006-08-31 at 14:16, Sayantan Sur wrote:
> Hello Kevin,
> 
> >With MVAPICH-0.9.8, I no longer see the behavior with the vanilla osu
> >benchmarks (osu_bw and osu_bibw).  However, I still see the problem with
> >a somewhat modified version I have (We submitted a version of this to
> >Dr. Panda, or you can find it at
> >http://www.pathscale.com/performance/InfiniPath/mpi_multibw/mpi_multibw.html
> >  
> >
> Thanks for verifying that you don't see the problem with the OSU 
> benchmarks as is, but only with the PathScale modified benchmark. How 
> about other well known MPI benchmarks? Do you see the same problem with 
> them too?
> 
> I am attaching a patch to the test you provided, could you please let us 
> know if this makes the program work correctly?

This does not fix the problem.  Also, this patch makes the program
vulnerable to misreporting bandwidths in the case when multiple
processes are sending across one link.  Because timing is done locally
rather than globally, without a barrier at the start of timing,
processes can serialize and each report full bandwidth, and the
benchmark will not perceive that the total completion time was doubled.


> 
> >
> >  To this point I have not succeeded in doing this.  I set up a hosts
> >file with the ib0 (IPoIB) net addresses in it, but what appears to
> >happen is mpirun ssh's over via IPoIB to start the jobs, but the jobs
> >then communicated via the ethernet link between them.
> >
> >  Have you played with this previously and have any advice on how to get
> >it to work?  I'll keep tinkering, but if you have any thoughts they
> >would be appreciated.
> >  
> >
> Well, in the past, I just used the IPoIB addresses instead of hostnames. 
> If you have hostnames attached to those IPs, that should work too, IMHO.

What I did is:

[root at hwlab-dhcp-228 OSU]# ifconfig ib0
ib0       Link encap:UNSPEC  HWaddr
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:192.168.9.251  Bcast:192.168.9.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:107964883 errors:0 dropped:0 overruns:0 frame:0
          TX packets:29724895 errors:0 dropped:1 overruns:0 carrier:0
          collisions:0 txqueuelen:128
          RX bytes:99387087701 (92.5 GiB)  TX bytes:18440322417 (17.1
GiB)

(other one is 192.168.9.252)

[root at hwlab-dhcp-228 OSU]# cat > mpihosts
192.168.9.251
192.168.9.252

[root at hwlab-dhcp-228 OSU]# mpirun -np 2 -machinefile mpihosts
./osu_latency


I do not have hostnames currently attached to the ib0 addresses.  I'll
look at doing this and see if it helps.

-Kevin

> 
> Thanks,
> Sayantan.



More information about the mvapich-discuss mailing list