[mvapich-discuss] IB benchmarks that are a bit strange

LEI CHAI chai.15 at osu.edu
Wed Mar 19 15:07:00 EDT 2008


Hi Brian,

> Thanks DK.  The systems are Opteron NUMA.  They both run processors at
> the 2.4 GHz and memory is ddr2 667.  They are both running mellanox
> infinihost III Lx HCAs.  Both nodes are running linux 2.6.24 with the
> infiniband drivers built in.

Thanks for the system information.

> It definitely has something to do with my head node.  I've been
> running tests on 3 nodes to better characterize what I've been seeing,
> and in my test I see close to 800 MB/s from slave to slave and about
> 600 MB/s from either slave to head or head to either slave.

Since they are Opteron NUMA systems, we suspect that it has something to do with the distance between the processor and the IB adapter.  I'm illustrating it in a diagram below:
 
 node 0               node 1
 3----2               2----3  
 |    |               |    |
 |    |               |    |
 1----0               0----1
      |               |
    adapter         adapter

In the above diagram core 0 is the closest to the adapter, therefore, if core 0's on the two nodes communicate to each other you can get the highest bandwidth. On the other hand, if core 3's communicate, the bandwidth can be 200MB/s less. We have observed this kind of behavior on Opteron NUMA systems before and see that the receive process's position matters the most (sender side almost has no impact). 

Based on your description, it is very likely that the topology on your head node is different than that on the slave nodes - probably core 0 is the nearest to the adapter on the slave nodes while farthest on the head node. SInce by default the two processes are running on core 0's, when the head node is the receiver, you could observe lower bandwidth. Could you verify the physical setup of your head node? It's preferrable to make it identical with the slave nodes.  

> If it's relevant, I'm also running ip over ib, and my head node runs
> iptables.  Could this cause the slowdown?

We feel that this shouldn't be the cause. To make sure, could you disable ipoib and see if the bandwidth difference disappear?

Lei


> 
> On Tue, Mar 18, 2008 at 7:36 PM, Dhabaleswar Panda
> <panda at cse.ohio-state.edu> wrote:
> > What kind of platforms you are using - Opteron NUMA or Intel? Are 
> the two
> >  systems homogeneous in terms of processor speed and memory 
> speed? Do the
> >  two systems have identical NICs - hw and firmware? Such 
> information will
> >  help to understand this problem better.
> >
> >  DK
> >
> >
> >
> >  On Sun, 16 Mar 2008, Brian Budge wrote:
> >
> >  > Hi all -
> >  >
> >  > I am running the osu_bw bandwidth test on my small cluster.  I'm
> >  > seeing some slightly strange behavior:  Let's say I have nodes 
> 0 and
> >  > 1.  When I launch the bw test from node 0 and run the test on 
> 0 and 1,
> >  > I see a max bandwidth of 650 MB/s.  However, when I run from 
> node 1
> >  > and run the test on 0 and 1, I see a max bandwidth of close to 
> 850>  > MB/s.  Does anyone know how I might diagnose/fix this 
> issue?  Has
> >  > anyone seen it before?
> >  >
> >  > Thanks,
> >  >   Brian
> >  > _______________________________________________
> >  > mvapich-discuss mailing list
> >  > mvapich-discuss at cse.ohio-state.edu
> >  > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >  >
> >
> >
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 



More information about the mvapich-discuss mailing list