Re: The inaugural VB6 vs dot net test
- From: "Mike Williams" <mikea@xxxxxxxxxxxxxxxxx>
- Date: Sun, 2 Dec 2007 22:26:36 -0000
"Ulrich Korndoerfer" <ulrich_wants_nospam@xxxxxxxxxxxx> wrote in message news:utfcouSNIHA.5208@xxxxxxxxxxxxxxxxxxxxxxx
In my oppinion the reason why one could hope that
counting down to 0 makes loops running faster comes
from how the compiler handles those loops. Some time
ago I had a look at the assembler code the compiler generates for counting up loops . . . [snip]
Well all compilers are different of course, but when I used to write machine code many years ago (both raw machine code and using an ASM) counting down to zero was definitely faster than counting up to a specific number, because as long a the initial value of the loop counter was such that it contained a number that would be considered positive when viewed as a signed value then you could simply perform a DEC (decrement the loop counter by 1) and follow that by a BPL (Branch on Positive or Branch on Plus), which would cause the branch to be taken (and therefore the loop to be repeated) unless the resultant value was negative. This would therefore repeat the loop, using the minumum number of loop maintenance instructions, until the value was decremented from a value of zero, in which case the result would suddenly become negative and the branch would no longer be taken. The alternative of counting up would require an INC (increment the loop counter by 1) followed by a compare instruction and then followed by the appropriate branch instruction, and would therefore take more clock cycles.
However, and this is the crucial point, saving a few clock cycles on the loop maintenance instructions is only worthwhile if the act of doing so does not in itself cause the code to run more slowly, as would be the case if you ended up actually reading and writing the memory in a "top down" order, which goes against the grain of the memory cache on many systems and which slows down the code by a much greater margin than it was speeded up by the lesser number of loop maintenance clock cycles. This is because the main memory access runs at a very much lower frequency than the processor itself (because of the multiplication factor used on modern processors) and in order to improve matters in this respect more data than is actually currently required is read from memory into the memory cache in order to speed things up because the cache can be read at a much higher frequency than can the main memory. The speed of any specific memory access therefore depends on whether or not the contents of the memory at the required address is already in the cache, and whether or not it is still marked as valid. Accessing memory in a manner that is not very efficient (as would be the case if the memory itself was accessed in a top down fashion) would not produce as many cache "hits" as would accessing it in a more efficient manner, and would therefore slow the code down a lot because accessing main memory is very slow (in comparison to accessing cache memory and particularly in comparison to how much work can be done within the processor itself in an equivalent amount of time).
So, in order to count down to zero instead of counting up to a specific value, and to do it in such a way as to maintain the small "loop maintenance" advantage, you would need to add further code to cause the memory to be accessed in the opposite fashion, and those extra instructions would negate all of the advantages that counting down produces, and probably a lot more as well. Things are probably quite a bit more complex than this on modern processors (mny own machine code experience is more than twenty years out of date) but I'm fairly sure that the same sort of reasoning still applies.
If you want to check what happens when you "go against the grain" of the memory cache then try changing one of the 1D code examples in this thread so that it counts down instead of up and see how long it takes to peform the bitmap invert task. Or you can even try altering my own 2D example by swapping the X, Y loops around (so that K is the inner loop rather than J), an act which would also fail to take full advantage of the memory cache.
When performing tasks such as this, the way in which you access the memory is of paramount importance and you always need to make sure that you are making full and efficient use of the memory cache.
Mike
.
- Follow-Ups:
- Re: The inaugural VB6 vs dot net test
- From: Ulrich Korndoerfer
- Re: The inaugural VB6 vs dot net test
- From: Michael C
- Re: The inaugural VB6 vs dot net test
- References:
- Re: The inaugural VB6 vs dot net test
- From: Ulrich Korndoerfer
- Re: The inaugural VB6 vs dot net test
- From: Ulrich Korndoerfer
- Re: The inaugural VB6 vs dot net test
- From: Mike Williams
- Re: The inaugural VB6 vs dot net test
- From: Ulrich Korndoerfer
- Re: The inaugural VB6 vs dot net test
- Prev by Date: Re: reduce browseforfolders hanging tim
- Next by Date: Re: The inaugural VB6 vs dot net test
- Previous by thread: Re: The inaugural VB6 vs dot net test
- Next by thread: Re: The inaugural VB6 vs dot net test
- Index(es):
Relevant Pages
|
Loading