[mvapich-discuss] big performance difference with 32 and 64
processes
zhang yigang
zhangyg at mail.iggcas.ac.cn
Thu Mar 13 08:02:01 EDT 2008
Dear Professor Panda:
Thanks for your response. It took me sevral days to set up finally MVAPICH-1.0 running smoothly in my new cluster. As you suggested, When I use MVAPICH instead of MVAPICH2, 64 processes do run faster than 32 processes.
Thanks a lot for your suggestion! It helps a lot.
Now I am trying to tune other parameters, trying to run VASP as faster as possible (VASP is a program doing quantum-mechanics molecular dynamics calculations). For such a big program, what suggestions you would have?
With best regards!
yigang Zhang
----- Original Message -----
From: "Dhabaleswar Panda" <panda at cse.ohio-state.edu>
To: "zhang yigang" <zhangyg at mail.iggcas.ac.cn>
Cc: <mvapich-discuss at cse.ohio-state.edu>
Sent: Saturday, March 08, 2008 10:30 PM
Subject: Re: [mvapich-discuss] big performance difference with 32 and 64 processes
> The performance degradation could be due to multiple reasons and you need
> to systematically investigate how your code is interacting with the
> underlying system and its configuration.
>
> A couple of things to note are as follows:
>
> - Your new cluster is a multi-core cluster (8 cores per node). Not sure
> about your older cluster. Do you have enough memory on these nodes (per
> core basis) compared to your old system. If you do not have enough
> memory, applications could be going through `thrasing' when running
> in fully-subscribed mode (all 8 cores).
>
> - What is the speed of your InfiniBand card - SDR/DDR? How many
> cards/ports connected to each node of your system. For example, if you
> have one card (with one port) per node, all incoming/outgoing
> communication from a given node (all 8 cores) need to go through the
> same card/port. The overall communication performance will also depend
> on the memory speed on each node.
>
> - You can run `Multiple Bandwidth Test' test (available from mvapich
> web page under performance section). You can see the results and
> examine whether bandwidth performance is increasing when
> it goes from 1-pair (one core communicating with one core on a
> different node), 2-pair (two cores communicating with two cores),
> 4-pairs, 8-pairs, etc. If it is not increasing when going from
> 4-pairs to 8-pairs, it will indicate that inter-node communication
> performance on your systems is not scaling with increasing
> number of cores/node in your system. This will especially hurt
> if your application uses large messages and also is
> `bandwidth-sensitive'.
>
> - To isolate any issues with the installation of mvapich2 1.0.2,
> you can also try to install mvapich 1.0 and see if you see
> the same performance degradation when going from 32 to 64 cores.
>
> Hope these guidelines help.
>
> DK
>
> On Sat, 8 Mar 2008, zhang yigang wrote:
>
> > Dear All:
> >
> > We just bought a new cluster made of dual quad core xeon on each node. Altogether we have 24 nodes connected by infiniband. W mainly plan to use the cluster for quantum mechanics calculations using the code named VASP. We installed Linux, ifort and mvapich2-1.0.2.
> >
> > When we use 32 processes (either 4node x 8process each node or 8node x 4process each node), the VASP seem to run just fine. The cluster, when tested using OSU_bw, seems also to give encouraging resutls. A strange thing happens when the number of processes increases to 64. VASP slowly down tremendously. The phenomenon is not observed on our home-made PC cluster made of Giga ethernet and AMD Opeteron machines (with the same ifort, VASP, mpich2).
> >
> > On the Vendor side, they say the osu_bw test is OK, so it is not a hardware problem. On the VASP side, it runs so nicely on our old cluster with quite similar software. Maybe the the problem can be solved by just turn on/off a switch.
> >
> > I have seeked all the archives of the mailing list and did not find a clue, so I am writing this email to seek your kind help.
> >
> >
> > with best regards!
> >
> > yigang Zhang
More information about the mvapich-discuss
mailing list