[mvapich-discuss] Same nodes different time?

Thu Mar 27 18:20:20 EDT 2014

I test many different number of cores, the case I mentioned is actually
using 320 cores. The difference make me shocked. I will ask our sys
administrator concerning the network thing. Otherwise, the comparison is
meaningless. Thanks for your help.

Yes, I am always using --no-prec-div for compiling, and you are right, I
can test with just one node. You mentioned, 5~10 million may scales well on
64 cores. So I guess, (it seems you are very familiar with CFD :) ), you
mean it's better to have 70000~100000 cells/core. But my experience shows
that 30000~50000 is a good estimate, I can't get better performance larger
than 50000 cells/core. I am running unsteady analysis.

I really am looking forward to see how my code scales up to 1000~2000 cores.

Zhigang Wei
----------------------
*University of Notre Dame*

On Thu, Mar 27, 2014 at 6:09 PM, Tony Ladd <tladd at che.ufl.edu> wrote:

> It can be large in my experience. I found in some cases a 10G infiniband
> network (in that case our universities HPC system) could be slower than
> gigabit ethernet (with highly optimized drivers and a flat network). Its
> true you have sole access to the nodes but not sole access to the switch
> network. One thing is to ask the sys admins if they will set aside a
> portion of the network for you sometime.
>
> However before doing that you might consider the following. A CFD code
> should not be communication bound - the surface to volume ratio works in
> your favor if you have enough nodes per processor. I notice you were using
> 64 cores - at a rough guess I would say your problem should be 5-10 million
> cells to get good parallel performance. If you want to test the compiler I
> would run on just 1 node (perhaps with a smaller problem). That is all you
> need to check the compiler - or even a single process. I doubt you will
> find much difference between O2 and O3. Better to check on the web for some
> magic flags. I found --no-prec-div (I think that was it) sped up one of my
> codes by a factor of 2. It prevents it from doing the IEEE error checking
> (or something like that) which can sometimes make a big difference. Someone
> at TACC put me on to that. There may well be others.
>
> Tony
>
>
> On 03/27/2014 05:58 PM, Daniel WEI wrote:
>
>>
>> On Thu, Mar 27, 2014 at 5:45 PM, Tony Ladd <tladd at che.ufl.edu <mailto:
>> tladd at che.ufl.edu>> wrote:
>>
>>     So your performance can vary depending on what else is going on
>>     with the other nodes in the system
>>
>>
>> Thank you Tony. I see.
>>
>> (1) But how much variance?! My results shows some very disturbing
>> difference, on one case, to initialize the case, it takes 37s, another 5s,
>> yet another 2s!!!
>> (2) How can I do my best, or somebody else to do their best, in order to
>> reduce this variance? (there is 16 cores/node, so there should be nobody
>> using the nodes I was calling, this seems to be guaranteed)
>> (3) I goal is to compare intel compiler's -O3 and -O2 difference on
>> building my CFD code concerning speed, but now if my performance vary even
>> in the same case, same hosts, how can I trust my results anymore....?
>>
>>
>>
>>
>>
>>
>> Zhigang Wei
>> ----------------------
>> /University of Notre Dame/
>>
>
> --
> Tony Ladd
>
> Chemical Engineering Department
> University of Florida
> Gainesville, Florida 32611-6005
> USA
>
> Email: tladd-"(AT)"-che.ufl.edu
> Web    http://ladd.che.ufl.edu
>
> Tel:   (352)-392-6509
> FAX:   (352)-392-9514
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140327/84756217/attachment.html>