[mvapich-discuss] Same nodes different time?

Thu Mar 27 22:00:12 EDT 2014

Wait, as I have said already, I think I am using the same set of hosts. I
manually create a hosts file, and mpirun -hostfile hosts -np N blahblah.  :)

Zhigang Wei
----------------------
*University of Notre Dame*

On Thu, Mar 27, 2014 at 9:54 PM, Hari Subramoni <subramoni.1 at osu.edu> wrote:

> Hello Daniel,
>
> I believe that what you just mentioned is not the same as I indicated. Let
> me try to explain. If you submit the same job two times to SGE (qsub
> job.sh; wait till ends; qsub job.sh), they may get executed on different
> set of hosts. The scheduler does not guarantee that they will get executed
> on the same set of hosts. As others indicated on the list, this can lead to
> variability in results due to network and topology effects. However, if you
> run the job as I indicated, it guarantees that both executions of the
> application runs on the same set of hosts. Thus, you can minimize the
> impact of network and topology on the performance of the application. Does
> this make sense?
>
> Thx,
> Hari.
>
>
> On Thu, Mar 27, 2014 at 9:45 PM, Daniel WEI <lakeat at gmail.com> wrote:
>
>> Hari,
>>
>>
>> I think I have already done this, that is in the same case folder,
>> Use SGE, submit the job script, and after it finished, submit it again.
>> Note, the number of CPU can access is fixed in my school, so no other guy
>> is using them. The job sent is through mpirun -hostfile hosts -np 320
>> blahblah
>>
>> The file written is only once, it is at the end of simulation so this
>> does not affect my time measurement. The file input reading is only at the
>> beginning, also once.
>>
>>
>>
>>
>>
>> Zhigang Wei
>> ----------------------
>> *University of Notre Dame*
>>
>>
>> On Thu, Mar 27, 2014 at 9:34 PM, Hari Subramoni <subramoni.1 at osu.edu>wrote:
>>
>>> Hello Daniel,
>>>
>>> Can you try two back to back runs on the same set of hosts and see if
>>> there is any variance in performance. To be clear, this is what I mean
>>>
>>> If interactive mode
>>> --------------------------
>>> 1. Request for a bunch of nodes from the schedules
>>> 2. Run application; Note time
>>> 3. Run application; Note time
>>>
>>> In batch mode
>>> ---------------------
>>> Create a shell script which runs the application twice like the one
>>> below and submit it to SGE
>>>
>>> #!/bin/bash
>>> for i in `seq 1 2`
>>> do
>>>     Run App; Store time;
>>> done
>>>
>>> You mention "reading in of the velocity field and pressure field". Does
>>> this involve any file system operation (like reading a file, writing to a
>>> file etc)? If you're touching the file system, the performance can vary
>>> wifely and this does not have anything to do with the MPI library.
>>>
>>> Regards,
>>> Hari.
>>>
>>>
>>> On Thu, Mar 27, 2014 at 8:56 PM, Daniel WEI <lakeat at gmail.com> wrote:
>>>
>>>> Measurement is implemented in my c++ code, using "sys/times.h", for
>>>> example:
>>>>
>>>> start = clock();
>>>> ... /* Do the work. */
>>>> end = clock();
>>>> elapsed = ((double) (end - start)) / CLOCKS_PER_SEC;
>>>>
>>>> I have tried both 5~20 minutes job, as well as 0.5~3 hours job, they
>>>> all show differences. Let's say the first is JOB-A, the latter is JOB-B.
>>>> At first I was testing JOB-B, and since I found there is difference,
>>>> even though the hosts are the same (just the order of hosts is different).
>>>> So I then started to test on a smaller job, that is JOB-A today, and I
>>>> fixed the order of hosts by manually create a hostfile, and then I found
>>>> even with the same order of hosts, the results are still different.
>>>>
>>>> I don't understand what did you mean by saying "warm up"
>>>> "startup/wrapup", etc. In my case, the "reading in" of the velocity field
>>>> and pressure field sometimes could be occasionally huge different (37
>>>> seconds in one case, 3 seconds in another case).
>>>>
>>>> I guess Tony's point makes sense, that the problem is in switches. But
>>>> I am not sure.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Zhigang Wei
>>>> ----------------------
>>>> *University of Notre Dame*
>>>>
>>>>
>>>> On Thu, Mar 27, 2014 at 7:58 PM, Gus Correa <gus at ldeo.columbia.edu>wrote:
>>>>
>>>>> On 03/27/2014 05:58 PM, Daniel WEI wrote:
>>>>>
>>>>>>
>>>>>> On Thu, Mar 27, 2014 at 5:45 PM, Tony Ladd <tladd at che.ufl.edu
>>>>>> <mailto:tladd at che.ufl.edu>> wrote:
>>>>>>
>>>>>>     So your performance can vary depending on what else is going on
>>>>>> with
>>>>>>     the other nodes in the system
>>>>>>
>>>>>>
>>>>>> Thank you Tony. I see.
>>>>>>
>>>>>> (1) But how much variance?! My results shows some very disturbing
>>>>>> difference, on one case, to initialize the case, it takes 37s, another
>>>>>> 5s, yet another 2s!!!
>>>>>> (2) How can I do my best, or somebody else to do their best, in order
>>>>>> to
>>>>>> reduce this variance? (there is 16 cores/node, so there should be
>>>>>> nobody
>>>>>> using the nodes I was calling, this seems to be guaranteed)
>>>>>> (3) I goal is to compare intel compiler's -O3 and -O2 difference on
>>>>>> building my CFD code concerning speed, but now if my performance vary
>>>>>> even in the same case, same hosts, how can I trust my results
>>>>>> anymore....?
>>>>>> Zhigang Wei
>>>>>> ----------------------
>>>>>> /University of Notre Dame/
>>>>>>
>>>>>>
>>>>> Hi Zhigang
>>>>>
>>>>> What time are you measuring?
>>>>> Wall time from the job scheduler for the whole job?
>>>>> Wall time for the application only (say with Unix time utility or
>>>>> MPI_Wtime)?
>>>>> Something else?
>>>>>
>>>>> Have you tried to run your test simulations for a longer time (several
>>>>> minutes, one hour perhaps, not just a few seconds)
>>>>> to see if the outcome shows less spread?
>>>>> Say, you could change the number of time steps to 100x
>>>>> or perhaps 10,000x what you are currently using,
>>>>> depending of course on the max walltime allowed by your cluster queue.
>>>>>
>>>>> My wild guess is that with short-lived simulations
>>>>> what may count most is the job or application
>>>>> startup and wrapup times, which may vary significantly in a cluster,
>>>>> specially in a big cluster, overwhelming and obscuring your program
>>>>> execution time.
>>>>> Most MPI and benchmark implementations recommend
>>>>> that you "warm up" your own tests/benchmarks
>>>>> for a time long enough to reduce such startup/wrapup effects.
>>>>>
>>>>> My two cents,
>>>>> Gus Correa
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> mvapich-discuss mailing list
>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140327/4b0a2573/attachment-0001.html>