[mvapich-discuss] big performance difference with 32 and 64 processes

Dhabaleswar Panda panda at cse.ohio-state.edu
Sat Mar 8 09:30:13 EST 2008


The performance degradation could be due to multiple reasons and you need
to systematically investigate how your code is interacting with the
underlying system and its configuration.

A couple of things to note are as follows:

- Your new cluster is a multi-core cluster (8 cores per node). Not sure
  about your older cluster. Do you have enough memory on these nodes (per
  core basis) compared to your old system. If you do not have enough
  memory, applications could be going through `thrasing' when running
  in fully-subscribed mode (all 8 cores).

- What is the speed of your InfiniBand card - SDR/DDR? How many
  cards/ports connected to each node of your system. For example, if you
  have one card (with one port) per node, all incoming/outgoing
  communication from a given node (all 8 cores) need to go through the
  same card/port. The overall communication performance will also depend
  on the memory speed on each node.

- You can run `Multiple Bandwidth Test' test (available from mvapich
  web page under performance section). You can see the results and
  examine whether bandwidth performance is increasing when
  it goes from 1-pair (one core communicating with one core on a
  different node), 2-pair (two cores communicating with two cores),
  4-pairs, 8-pairs, etc. If it is not increasing when going from
  4-pairs to 8-pairs, it will indicate that inter-node communication
  performance on your systems is not scaling with increasing
  number of cores/node in your system. This will especially hurt
  if your application uses large messages and also is
  `bandwidth-sensitive'.

- To isolate any issues with the installation of mvapich2 1.0.2,
  you can also try to install mvapich 1.0 and see if you see
  the same performance degradation when going from 32 to 64 cores.

Hope these guidelines help.

DK

On Sat, 8 Mar 2008, zhang yigang wrote:

> Dear All:
>
> We just bought a new cluster made of dual quad core xeon on each node. Altogether we have 24 nodes connected by infiniband. W mainly plan to use the cluster for quantum mechanics calculations using the code named VASP. We installed Linux, ifort  and mvapich2-1.0.2.
>
> When we use 32 processes (either 4node x 8process each node or 8node x 4process each node), the VASP seem to run just fine. The cluster, when tested using OSU_bw, seems also to give encouraging resutls. A strange thing happens when the number of processes increases to 64. VASP slowly down tremendously. The phenomenon is not observed on our home-made PC cluster made of Giga ethernet and AMD Opeteron machines (with the same ifort, VASP, mpich2).
>
> On the Vendor side, they say the osu_bw test is OK, so it is not a hardware problem. On the VASP side, it runs so nicely on our old cluster with quite similar software. Maybe the the problem can be solved by just turn on/off a switch.
>
> I have seeked all the archives of the mailing list and did not find a clue, so I am writing this email to seek your kind help.
>
>
> with best regards!
>
> yigang Zhang



More information about the mvapich-discuss mailing list