[mvapich-discuss] big performance difference with 32 and 64
processes
Dhabaleswar Panda
panda at cse.ohio-state.edu
Sat Mar 8 09:30:13 EST 2008
The performance degradation could be due to multiple reasons and you need
to systematically investigate how your code is interacting with the
underlying system and its configuration.
A couple of things to note are as follows:
- Your new cluster is a multi-core cluster (8 cores per node). Not sure
about your older cluster. Do you have enough memory on these nodes (per
core basis) compared to your old system. If you do not have enough
memory, applications could be going through `thrasing' when running
in fully-subscribed mode (all 8 cores).
- What is the speed of your InfiniBand card - SDR/DDR? How many
cards/ports connected to each node of your system. For example, if you
have one card (with one port) per node, all incoming/outgoing
communication from a given node (all 8 cores) need to go through the
same card/port. The overall communication performance will also depend
on the memory speed on each node.
- You can run `Multiple Bandwidth Test' test (available from mvapich
web page under performance section). You can see the results and
examine whether bandwidth performance is increasing when
it goes from 1-pair (one core communicating with one core on a
different node), 2-pair (two cores communicating with two cores),
4-pairs, 8-pairs, etc. If it is not increasing when going from
4-pairs to 8-pairs, it will indicate that inter-node communication
performance on your systems is not scaling with increasing
number of cores/node in your system. This will especially hurt
if your application uses large messages and also is
`bandwidth-sensitive'.
- To isolate any issues with the installation of mvapich2 1.0.2,
you can also try to install mvapich 1.0 and see if you see
the same performance degradation when going from 32 to 64 cores.
Hope these guidelines help.
DK
On Sat, 8 Mar 2008, zhang yigang wrote:
> Dear All:
>
> We just bought a new cluster made of dual quad core xeon on each node. Altogether we have 24 nodes connected by infiniband. W mainly plan to use the cluster for quantum mechanics calculations using the code named VASP. We installed Linux, ifort and mvapich2-1.0.2.
>
> When we use 32 processes (either 4node x 8process each node or 8node x 4process each node), the VASP seem to run just fine. The cluster, when tested using OSU_bw, seems also to give encouraging resutls. A strange thing happens when the number of processes increases to 64. VASP slowly down tremendously. The phenomenon is not observed on our home-made PC cluster made of Giga ethernet and AMD Opeteron machines (with the same ifort, VASP, mpich2).
>
> On the Vendor side, they say the osu_bw test is OK, so it is not a hardware problem. On the VASP side, it runs so nicely on our old cluster with quite similar software. Maybe the the problem can be solved by just turn on/off a switch.
>
> I have seeked all the archives of the mailing list and did not find a clue, so I am writing this email to seek your kind help.
>
>
> with best regards!
>
> yigang Zhang
More information about the mvapich-discuss
mailing list