Re: what is the best datatype for..

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



On 2007-11-19 16:56:41 -0800, "Hilton" <nospam@xxxxxxxxxx> said:

Peter,

A dictionary implementation will be faster than your suggested
implementation.

You misunderstood probably because I never explained it well enough.

Yes, it's hard to know what a person is talking about when they don't explain themselves.

If I understand your suggestion correctly, you are proposing creating an
array that will be scanned searching for the index value, at which point
the count can be updated.

Nope, no scanning whatsoever.

If you know you'll have say 1000 possible indices, then define an array of
1000 structs, then when you see a value of 605, you simply do
"array[605].count++" (i.e. your basic histogram code). I bet that is faster
than any dictionary implementation.

Yes, that's probably true. Based on your updated description, I'd agree the dictionary implementation will be slower. However, it will only be slower by a very tiny amount (assuming the rest of the code is doing anything interesting, it would be difficult to even measure the difference...the Dictionary class is still constant order just like your lookup-array), and your implementation will use a lot more memory than a dictionary would unless the range of values is completely used or at least nearly so.

It also won't be flexible if the input data changes for any reason, since the implementation itself is so dependent on the input data.

Finally, one could use the SortedDictionary class for simplicity. It would be much slower for the counting part of the algorithm (O(log n) instead of constant order), but when you're done counting the data would already be sorted. In the end, I'd be surprised if the total time spent was significantly different (for sure, the order of the algorithm would be the same: O(n log n) for all three variations).

"Way faster" isn't a well-defined phrase, but I doubt that any reasonable interpretation of "way faster" would properly describe the minimal difference in performance between using a dictionary and your lookup-array proposal.

In fact, you don't even need the struct
if most of the work is 'reading' the value and the sorting doesn't need to
be 'high-performance' - then you can define an array of ints and not structs
and spend a little more time doing the sorting. To decide, we'd need to
understand more about the nature of the data.

Well, personally I would almost never implement a design like the one you proposed. I certainly would never do it based solely on performance reasons. If my input data was a reasonably narrow range (say, 16-bit values or less), _and_ I expected the input values to cover the range with very few exceptions, then I might choose a solution like the one you proposed. But mainly because it's simpler, not because it might be a little faster.

Frankly, when you write stuff like "Being a CF engineer, I'm always looking to optimize", you are REALLY insulting to those of us who don't write CF code. As if we aren't concerned with optimization as well. Furthermore, it's pretty clear from this thread and some others we've seen that you have a very specific and narrow idea of what "optimize" means, and if someone doesn't find your choice of trade-offs useful, you dismiss them as someone who isn't optimizing their code.

The fact is, here you've suggested a solution that isn't really going to gain any significant performance, and which has its own costs that may not be appropriate for the situation (and in fact is likely to be inappropriate for most situations, _especially_ in CF where memory is at a much greater premium than on a desktop computer).

Calling the dictionary solution "inefficient" is just plain dumb; it might not be quite as fast as a straight array lookup, but it's not like it's a _slow_ design. It's not "inefficient" at all. It's just a different way of approaching the problem and your implication that anyone who might use a dictionary implementation is writing inefficient code is very annoying.

So, if you're wondering why your posts seem to generate what you think are "flames", you might take a good look at how you write your "suggestions" and whether they really qualify as "more efficient" anyway.

Pete

.



Relevant Pages

  • Re: what is the best datatype for..
    ... array that will be scanned searching for the index value, ... It also won't be flexible if the input data changes for any reason, ... difference in performance between using a dictionary and your lookup-array ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: About Thread Completion Notify.
    ... This is the reason that i must keep track of the threads and they have to ... i remove it from the array. ... > (which you probably don't since you are servicing requests in separate ... you shouldn't have a need for waiting on events. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Beginner Programmer needs help
    ... >> which reliably gets input data. ... Doing user input in a 'foolproof' way is ... question is what state to leave the array in after such an ESC-press ...
    (comp.lang.cpp)
  • Re: please confirm something (jdbc)
    ... of its internal buffer so that will double the amount of memory that you need. ... That's the reason I wrote my own dynamic byte array ... Given all the memory re-allocation that goes on inside the ByteArrayOutputStream anyway, I'm not sure that one more allocation is all that much extra overhead. ... If I have two ways of coding something, and I know that one will definitely use less memory (or be quicker for whatever reason) then I will choose the one "better" one. ...
    (comp.lang.java.programmer)
  • Re: Buffer Overflow. What is it?
    ... ]I saw so many buffer overflow attacks on Windows OS. ... How can such event allows an attacker to put a program on the ... area allocated for that array. ... simply copy all the input data given to the routine into that array. ...
    (comp.security.misc)