[mvapich-discuss] big performance difference with 32 and 64 processes

zhang yigang zhangyg at mail.iggcas.ac.cn
Thu Mar 13 08:02:01 EDT 2008


Dear Professor Panda:

Thanks for your response. It took me sevral days to set up  finally MVAPICH-1.0 running smoothly in my new cluster. As you suggested, When I use MVAPICH instead of MVAPICH2, 64 processes do run faster than 32 processes. 

Thanks a lot for your suggestion! It helps a lot.

Now I am trying to tune other parameters, trying to run VASP as faster as possible (VASP is a program doing quantum-mechanics molecular dynamics calculations). For such a big program, what suggestions you would have?

With best regards!

yigang Zhang

----- Original Message ----- 
From: "Dhabaleswar Panda" <panda at cse.ohio-state.edu>
To: "zhang yigang" <zhangyg at mail.iggcas.ac.cn>
Cc: <mvapich-discuss at cse.ohio-state.edu>
Sent: Saturday, March 08, 2008 10:30 PM
Subject: Re: [mvapich-discuss] big performance difference with 32 and 64 processes


> The performance degradation could be due to multiple reasons and you need
> to systematically investigate how your code is interacting with the
> underlying system and its configuration.
> 
> A couple of things to note are as follows:
> 
> - Your new cluster is a multi-core cluster (8 cores per node). Not sure
>   about your older cluster. Do you have enough memory on these nodes (per
>   core basis) compared to your old system. If you do not have enough
>   memory, applications could be going through `thrasing' when running
>   in fully-subscribed mode (all 8 cores).
> 
> - What is the speed of your InfiniBand card - SDR/DDR? How many
>   cards/ports connected to each node of your system. For example, if you
>   have one card (with one port) per node, all incoming/outgoing
>   communication from a given node (all 8 cores) need to go through the
>   same card/port. The overall communication performance will also depend
>   on the memory speed on each node.
> 
> - You can run `Multiple Bandwidth Test' test (available from mvapich
>   web page under performance section). You can see the results and
>   examine whether bandwidth performance is increasing when
>   it goes from 1-pair (one core communicating with one core on a
>   different node), 2-pair (two cores communicating with two cores),
>   4-pairs, 8-pairs, etc. If it is not increasing when going from
>   4-pairs to 8-pairs, it will indicate that inter-node communication
>   performance on your systems is not scaling with increasing
>   number of cores/node in your system. This will especially hurt
>   if your application uses large messages and also is
>   `bandwidth-sensitive'.
> 
> - To isolate any issues with the installation of mvapich2 1.0.2,
>   you can also try to install mvapich 1.0 and see if you see
>   the same performance degradation when going from 32 to 64 cores.
> 
> Hope these guidelines help.
> 
> DK
> 
> On Sat, 8 Mar 2008, zhang yigang wrote:
> 
> > Dear All:
> >
> > We just bought a new cluster made of dual quad core xeon on each node. Altogether we have 24 nodes connected by infiniband. W mainly plan to use the cluster for quantum mechanics calculations using the code named VASP. We installed Linux, ifort  and mvapich2-1.0.2.
> >
> > When we use 32 processes (either 4node x 8process each node or 8node x 4process each node), the VASP seem to run just fine. The cluster, when tested using OSU_bw, seems also to give encouraging resutls. A strange thing happens when the number of processes increases to 64. VASP slowly down tremendously. The phenomenon is not observed on our home-made PC cluster made of Giga ethernet and AMD Opeteron machines (with the same ifort, VASP, mpich2).
> >
> > On the Vendor side, they say the osu_bw test is OK, so it is not a hardware problem. On the VASP side, it runs so nicely on our old cluster with quite similar software. Maybe the the problem can be solved by just turn on/off a switch.
> >
> > I have seeked all the archives of the mailing list and did not find a clue, so I am writing this email to seek your kind help.
> >
> >
> > with best regards!
> >
> > yigang Zhang



More information about the mvapich-discuss mailing list