[mvapich-discuss] Same nodes different time?

Sourav Chakraborty chakraborty.52 at osu.edu
Thu Mar 27 21:32:59 EDT 2014


Hi Daniel,

In my case, the "reading in" of the velocity field and pressure field
> sometimes could be occasionally huge different (37 seconds in one case, 3
> seconds in another case).
>

Did you mean reading the input file by that? In that case, what filesystem
are you reading from?

Sourav Chakraborty
The Ohio State University


On Thu, Mar 27, 2014 at 8:56 PM, Daniel WEI <lakeat at gmail.com> wrote:

> Measurement is implemented in my c++ code, using "sys/times.h", for
> example:
>
> start = clock();
> ... /* Do the work. */
> end = clock();
> elapsed = ((double) (end - start)) / CLOCKS_PER_SEC;
>
> I have tried both 5~20 minutes job, as well as 0.5~3 hours job, they all
> show differences. Let's say the first is JOB-A, the latter is JOB-B.
> At first I was testing JOB-B, and since I found there is difference, even
> though the hosts are the same (just the order of hosts is different). So I
> then started to test on a smaller job, that is JOB-A today, and I fixed the
> order of hosts by manually create a hostfile, and then I found even with
> the same order of hosts, the results are still different.
>
> I don't understand what did you mean by saying "warm up" "startup/wrapup",
> etc. In my case, the "reading in" of the velocity field and pressure field
> sometimes could be occasionally huge different (37 seconds in one case, 3
> seconds in another case).
>
> I guess Tony's point makes sense, that the problem is in switches. But I
> am not sure.
>
>
>
>
>
> Zhigang Wei
> ----------------------
> *University of Notre Dame*
>
>
> On Thu, Mar 27, 2014 at 7:58 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
>
>> On 03/27/2014 05:58 PM, Daniel WEI wrote:
>>
>>>
>>> On Thu, Mar 27, 2014 at 5:45 PM, Tony Ladd <tladd at che.ufl.edu
>>> <mailto:tladd at che.ufl.edu>> wrote:
>>>
>>>     So your performance can vary depending on what else is going on with
>>>     the other nodes in the system
>>>
>>>
>>> Thank you Tony. I see.
>>>
>>> (1) But how much variance?! My results shows some very disturbing
>>> difference, on one case, to initialize the case, it takes 37s, another
>>> 5s, yet another 2s!!!
>>> (2) How can I do my best, or somebody else to do their best, in order to
>>> reduce this variance? (there is 16 cores/node, so there should be nobody
>>> using the nodes I was calling, this seems to be guaranteed)
>>> (3) I goal is to compare intel compiler's -O3 and -O2 difference on
>>> building my CFD code concerning speed, but now if my performance vary
>>> even in the same case, same hosts, how can I trust my results
>>> anymore....?
>>> Zhigang Wei
>>> ----------------------
>>> /University of Notre Dame/
>>>
>>>
>> Hi Zhigang
>>
>> What time are you measuring?
>> Wall time from the job scheduler for the whole job?
>> Wall time for the application only (say with Unix time utility or
>> MPI_Wtime)?
>> Something else?
>>
>> Have you tried to run your test simulations for a longer time (several
>> minutes, one hour perhaps, not just a few seconds)
>> to see if the outcome shows less spread?
>> Say, you could change the number of time steps to 100x
>> or perhaps 10,000x what you are currently using,
>> depending of course on the max walltime allowed by your cluster queue.
>>
>> My wild guess is that with short-lived simulations
>> what may count most is the job or application
>> startup and wrapup times, which may vary significantly in a cluster,
>> specially in a big cluster, overwhelming and obscuring your program
>> execution time.
>> Most MPI and benchmark implementations recommend
>> that you "warm up" your own tests/benchmarks
>> for a time long enough to reduce such startup/wrapup effects.
>>
>> My two cents,
>> Gus Correa
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140327/5b36d47f/attachment-0001.html>


More information about the mvapich-discuss mailing list