Re: any performance difference?

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



There is only 1 answer to questions like this and it is 'run a test'

I appreciate the answer, but as I told, the problem is: run WHERE :)
This code runs under Win32, Win64, Solaris, MacOS and some flavors of
Linux... Each platform has at least a couple of processors (e.g. Mac
Intel/PPC) and a couple of compilers, so you quickly deduce the number of
tests is exponential...
What we need (at the beginning) is some abstract remark, for example: is
it true that calling a function via a pointer improves the branch
prediction?

I have no idea. Branch prediction is CPU specific, so you'd have to answer
this question for each platform your program runs on.
Another problem is that each platform has its own compiler, which performs
different optimizations.

A couple of years ago, I and a couple of colleagues entered a coding
challenge where you got a buffer with consecutive 7 bit characters, so the
first bit of char 1 was the bit right after bit 6 of char 0.
The challenge was to calculate the parity bit of a very large buffer in as
short a time as possible. code size or memory use was not evaluated.
We decided we wanted to win 1st prize so we spent a lot of time on it. The
problem was that once you have made an efficient algorithm, each
optimization becomes CPU dependant.
I went for a lookup based approach and my algorithm was blazing fast on a
PIII with it's superior cache, but it sucked bigtime on a P4 which has a
sucky cache and memory interface.

My colleagues went for a calculation based approach which outperformed mine
with a factor of 50% on a P4, but was slow as molasses on my PIII laptop.

In the end we were beat by one of our customers who has a very good
relationship with the vendor that organized the challenge. He knew the excat
type of machine that was used by the judging committee, confiscated that
same machine somewhere within his company and optimized for that machine.

Moral of the story:
a) there is no good way to optimized across all possible platforms.
b) the only way to know what you will get for performance is to test.

unless these constructs are in tight loops that are executed millions of
times in succession

yes, that's exactly the case: every clock cycle is worth.

I appreciate your problem and I would really like to help you, but you
cannot micro optimize a generic peice of code, and then hope that you will
squeeze every last bit of performance out of the code that is compiled with
different compilers and runs on different platforms.

To get a feel, create 2 testcases in plain C++. your product is cross
platform anyway.
compile for the different platforms and look at the results.
if 1 approach always outperforms the other, go for it.
if not then decide which is most worthwhile.

most of these proof of concept tests can be done in a single day and will
give you a solid basis to make decisions on.

I have been in a performance critical situation before where the performance
was part of the requirments (image processing on sattelite data).
I tried several methods for optimizing my algorithms, but in the end I used
the production system for benchmarking test cases, and optimized for the
performance on that machine with those specific processors, cache and
memory.

--
Kind regards,
Bruno van Dooren MVP - VC++
http://msmvps.com/blogs/vanDooren
bruno_nos_pam_van_dooren@xxxxxxxxxxx


.



Relevant Pages

  • Re: [OT] Intel Compilers vs. Microsoft Compilers Optimization/Performance on Math Calculations,....
    ... Generally, "optimization" of loops no longer buys as much as it used to, because the x86 ... for cache hits, you will get *substantially* faster computations than if you do not. ... The difference between the Intel and Microsoft compilers might be 10% ... One page fault masks all other optimizations by 5-6 orders of magnitude ...
    (microsoft.public.vc.mfc)
  • Re: ANSI C compliance
    ... One of the world's most used C compilers, gcc, has a command line ... But it has nothing to do with the C language, ... > platform might not work on another platform. ... Chars have 8 bits and plain char is signed. ...
    (comp.lang.c)
  • Re: Reusable source code
    ... why optimizations in optimizing compilers make notable alterations to the ... One needn't be a qsort() internals man to understand it spends ... preprocessor (actually, most notably, with MSVC's preprocessor, and my ...
    (comp.lang.c)
  • Re: ARM/Linux: Is this a cross-compiler bug? =?utf-8?B?w4PCr8OC?= =?utf-8?B?wrxebWVtY3
    ... For example non-volatile ... accesses to hardware, or "Optimising" of delay loops and time ... I know for a fact that one can beat compilers with aggressive optimizations ...
    (comp.arch.embedded)
  • Re: read file, return number of bytes
    ... RH> fread returns the number of objects read. ... I read, it's recommended to use the exact-wodth integer types, as they ... because not every platform has a compiler that understands the ... world where all platforms have compilers that conform to the latest spec. ...
    (comp.programming)