Re: Profiling tools/how to discover hot-spots in my code?
From: Joseph M. Newcomer (newcomer_at_flounder.com)
Date: 09/06/04
- Next message: Joseph M. Newcomer: "Re: Changing size and placement of maximized dialog"
- Previous message: Joseph M. Newcomer: "Re: Clicking CButton in child dialog window freezes app."
- In reply to: Casper B.: "Re: Profiling tools/how to discover hot-spots in my code?"
- Messages sorted by: [ date ] [ thread ]
Date: Sun, 05 Sep 2004 23:29:06 -0400
I think that adding a siulation/stub class will add a simulation/stub class. It is not
clear that it will produce anything meaningful in terms of performance measurement. Disk
access times tend to average out (after all, they are taken from a uniform distribution)
so the real hotspots will show up independent of the disk access patterns.
If you have a problem that is based on CPU load, there are programs like the performance
monitor that will give you load statistics, such as the number of page faults taken, the
number of interrupts per second handled, and similar features.
Disabling the cache would skew results so badly that they would really become meaningless.
I think you are obsessing over the wrong problems.
Program hotspots are caused by poor algorithm design. Sometimes, you don't know the
algorithm design is poor until you discover that an apparently simple decision has serious
performance implications. In almost all cases I found, especially in my own code, it is
the second-order effects that matter. In some cases, third-order effects dominate. For
example, an algorithm which results in a lot of storage being allocated, and which causes
the storage to be spread out all over memory, can kill you in page faults. You will never
find this performance glitch by profiling your code, because the profiling tools don't
measure this. You will find it by using perfmon and discovering that your page fault rate
is going up. But I'm not aware of any simple tool that lets you find out what aspect of
your program is causing the page faulting.
You might lookinto profiling tools from places like NuMega Software (www.numega.com will
link to their current owner's site) and other commercial offerings, but all the serious
tools are very expensive and it is not clear what they are going to answer. You first have
to know what the question is.
As I said, I spent a lot of years doing performance analysis. Even with detailed tools,
this is an art.
I also built one of the best storage allocators of its era. Yet the product group claimed
that when they used their program-counter hotspot analyzer, 80% of the time was in the
storage allocator. Was it my fault? The answer turned out to be "no", because I flipped on
the internal performance counters in the allocator. The reason the allocator was the
hotstop was that it was being called 4,000,000 times. Why? Bad algorithm design, one level
up. The programmer wasn't sure how big a buffer would be needed, so he computed the size
and allocated it dynamically, each time the subroutine was called. One level above him,
the next programmer called him 2,000,000 times to do what was a simple task. But it
involved calling the storage allocator!
We changed the code to preallocate some fixed number of bytes on the stack, and only call
the allocator if that was too small. This so reduced the number of calls to the allocator
that the allocator was no longer the hotspot. The next hotspot was the algorithm that
called this subroutine 2,000,000 times. It turns out that most of the time, the string
value was constant and never changed, so the first thing to do was to cache the value, and
only call if it was required to be recomputed. This knocked another couple million calls
out of the picture.
Expecting to see a single hotspot which when fixed gives you an overall order of magnitude
performance improvement is optimistic. I've seen it happen perhaps twice in 40 years, and
always because there was a really stupid algorithmic design error. Mostly, you hope to get
a factor of two in a subroutine, a factor of 5 in some more complex algorithm, a 10%
improvement somewhere else. When all of these are combined, you might see an order of
magnitude improvement ovreall, but this process can take weeks, assuming that all you do
during that time is concentrate on performance improvement.
Statistical variation in disk latency is irrelevant. Decreasing the number of accesses to
the disk itself matters. We once got 20% improvement by changing a getc() to a gets() and
iterating over the string. We got a 30% improvement in a compiler by turning off the
formatting of the listing line unless a listing was being generated (the programmer called
a subroutine to format the line, then threw it away if it wasn't being used). We got a 20%
improvment by not copying a buffer. Those are the things you look for first. Cache hits
rarely matter in general-purpose algorithms. But if you have something like an image
processing algorithms, changing the algorithm to maximize cache hits in the L1 and L2
caches can buy an order of magnitude performance in that loop, which could translate into
either an order of magnitude performance improvement in the program, or a 5% performance
improvement in the program, depending on what else is done before and after.
I wouldn't worry about the statstical behavior of the disk until I knew what the program
was actually doing. Only if I was convinced that disk performance was the bottleneck would
I waste any effort trying to measure it, and it is more likely that I'd redesign the
algorithms rather than worry about statistical variation.
joe
On Sat, 04 Sep 2004 04:06:17 -0700, "Casper B." <casper@jbr.dk> wrote:
>Indeed, life just got a little harder... I am a bit baffled that I'm
>left with primitive heuristic approaches.
>
>Basically I guess I will have to #ifdef and utilize a simulation/stub
>class in order to always get the same deterministic time of my disk
>accessing - I had just hoped there was some kind of tool that would be
>able to deduce hotspots based on ammount of timeslices methods got. This
>would be so practical and hardly impossible to make? (Even stepping
>through with coulthe debugger one could deduce a whole lot but it would
>take forever and the CPU load would be so unrealistically low that it
>would render any conclusion useless).
>
>And as for the cache issue, since there's no way to disable this I guess
>one would have to pull enough random data through the machinery in order
>to uptain a creditable first run-time environment.
>
>Thanks for your feedback,
>/Casper
>
>Joseph M. Newcomer wrote:
>
>> There is no profiling tool in existence which does not degrade or contaminate the
>> application it is measuring. So you can factor that problem out. It ALWAYS produces
>> problems. Slight variants in timing, slight variants in cache line reuse, etc. will
>> perturb the measurement. Paging performance will change. Nondeterminstic disk timing
>> issues will change. Life Is Hard.
>>
>> Unless you have a bus analyzer doing hardware tracing and reverse-engineer program flow
>> from the trace, you will have perturbations. So just ignore the degradation/contamination
>> issue, it is always there.
>>
>> Put them under an #ifdef and get good approximate numbers. Those will be good enough.
>>
>> I spent 15 years doing performance measurement and analysis. We used many tools, including
>> the equivalent of QueryPerformanceCounter. I had performance tools that slowed the
>> application by two orders of magnitude, and still gave important data. Generally hotspots
>> are so massive that precise timing is irrelevant.
>> joe
>>
>> On Fri, 03 Sep 2004 23:08:59 -0700, "Casper B." <casper@jbr.dk> wrote:
>>
>>
>>>I believe the upcoming Whidbey/VS.2005 includes tools for profiling
>>>application, but what techniques are exists/are common practice as of
>>>today to discover where the bulk of the work is done during program
>>>execution?
>>>
>>>I use the QueryPerformanceCounter API at strategic places but my current
>>>application includes some nondeterministic disc-accessing in my
>>>algorithms which makes it hard to place a bunch of API call's in without
>>>degrading/contaminating the application.
>>>
>>>Thanks in advance,
>>>/Casper
>>
>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer@flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
- Next message: Joseph M. Newcomer: "Re: Changing size and placement of maximized dialog"
- Previous message: Joseph M. Newcomer: "Re: Clicking CButton in child dialog window freezes app."
- In reply to: Casper B.: "Re: Profiling tools/how to discover hot-spots in my code?"
- Messages sorted by: [ date ] [ thread ]