[mvapich-discuss] IB benchmarks that are a bit strange

Brian Budge brian.budge at gmail.com
Wed Mar 19 15:30:02 EDT 2008


Hi Lei -

Thanks for the information.  I suppose that could be the problem.  The
master node does have a different setup from the slaves, and is a 4
socket system, while the slaves are 2 socket systems.   Is there any
way to verify this?

It makes sense that the bandwidth would be lower from one processor
than another, but it still seems a little bit strange, since I would
only expect that 200 MB/s to show up if we were talking about
bandwidths over about 2 GB/s (I typically see about 2.3 GB/s per
processor).

Is it possible using a numa library to specify which physical
processor to run on?  If so, I could attempt to run this on varying
processors and observe the changes.

I would like to try without ipoib, but it may be some time before I
can try that since this involves the head node and I have users other
than myself on some of the cluster.

Thanks again for the help,
  Brian

On Wed, Mar 19, 2008 at 12:07 PM, LEI CHAI <chai.15 at osu.edu> wrote:
> Hi Brian,
>
>
>  > Thanks DK.  The systems are Opteron NUMA.  They both run processors at
>  > the 2.4 GHz and memory is ddr2 667.  They are both running mellanox
>  > infinihost III Lx HCAs.  Both nodes are running linux 2.6.24 with the
>  > infiniband drivers built in.
>
>  Thanks for the system information.
>
>
>  > It definitely has something to do with my head node.  I've been
>  > running tests on 3 nodes to better characterize what I've been seeing,
>  > and in my test I see close to 800 MB/s from slave to slave and about
>  > 600 MB/s from either slave to head or head to either slave.
>
>  Since they are Opteron NUMA systems, we suspect that it has something to do with the distance between the processor and the IB adapter.  I'm illustrating it in a diagram below:
>
>   node 0               node 1
>   3----2               2----3
>   |    |               |    |
>   |    |               |    |
>   1----0               0----1
>       |               |
>     adapter         adapter
>
>  In the above diagram core 0 is the closest to the adapter, therefore, if core 0's on the two nodes communicate to each other you can get the highest bandwidth. On the other hand, if core 3's communicate, the bandwidth can be 200MB/s less. We have observed this kind of behavior on Opteron NUMA systems before and see that the receive process's position matters the most (sender side almost has no impact).
>
>  Based on your description, it is very likely that the topology on your head node is different than that on the slave nodes - probably core 0 is the nearest to the adapter on the slave nodes while farthest on the head node. SInce by default the two processes are running on core 0's, when the head node is the receiver, you could observe lower bandwidth. Could you verify the physical setup of your head node? It's preferrable to make it identical with the slave nodes.
>
>
>  > If it's relevant, I'm also running ip over ib, and my head node runs
>  > iptables.  Could this cause the slowdown?
>
>  We feel that this shouldn't be the cause. To make sure, could you disable ipoib and see if the bandwidth difference disappear?
>
>  Lei
>
>
>
>
>  >
>  > On Tue, Mar 18, 2008 at 7:36 PM, Dhabaleswar Panda
>  > <panda at cse.ohio-state.edu> wrote:
>  > > What kind of platforms you are using - Opteron NUMA or Intel? Are
>  > the two
>  > >  systems homogeneous in terms of processor speed and memory
>  > speed? Do the
>  > >  two systems have identical NICs - hw and firmware? Such
>  > information will
>  > >  help to understand this problem better.
>  > >
>  > >  DK
>  > >
>  > >
>  > >
>  > >  On Sun, 16 Mar 2008, Brian Budge wrote:
>  > >
>  > >  > Hi all -
>  > >  >
>  > >  > I am running the osu_bw bandwidth test on my small cluster.  I'm
>  > >  > seeing some slightly strange behavior:  Let's say I have nodes
>  > 0 and
>  > >  > 1.  When I launch the bw test from node 0 and run the test on
>  > 0 and 1,
>  > >  > I see a max bandwidth of 650 MB/s.  However, when I run from
>  > node 1
>  > >  > and run the test on 0 and 1, I see a max bandwidth of close to
>  > 850>  > MB/s.  Does anyone know how I might diagnose/fix this
>  > issue?  Has
>  > >  > anyone seen it before?
>  > >  >
>  > >  > Thanks,
>  > >  >   Brian
>  > >  > _______________________________________________
>  > >  > mvapich-discuss mailing list
>  > >  > mvapich-discuss at cse.ohio-state.edu
>  > >  > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>  > >  >
>  > >
>  > >
>  > _______________________________________________
>  > mvapich-discuss mailing list
>  > mvapich-discuss at cse.ohio-state.edu
>  > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>  >
>
>


More information about the mvapich-discuss mailing list