Re: what is the best datatype for..

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Dude, chill. The guy asked how to write a histogram, I said use an array of
integers.

Hilton


"Peter Duniho" <NpOeStPeAdM@xxxxxxxxxxxxxxxx> wrote in message
news:2007111919142216807-NpOeStPeAdM@xxxxxxxxxxxxxxxxxx
On 2007-11-19 16:56:41 -0800, "Hilton" <nospam@xxxxxxxxxx> said:

Peter,

A dictionary implementation will be faster than your suggested
implementation.

You misunderstood probably because I never explained it well enough.

Yes, it's hard to know what a person is talking about when they don't
explain themselves.

If I understand your suggestion correctly, you are proposing creating an
array that will be scanned searching for the index value, at which point
the count can be updated.

Nope, no scanning whatsoever.

If you know you'll have say 1000 possible indices, then define an array
of
1000 structs, then when you see a value of 605, you simply do
"array[605].count++" (i.e. your basic histogram code). I bet that is
faster
than any dictionary implementation.

Yes, that's probably true. Based on your updated description, I'd agree
the dictionary implementation will be slower. However, it will only be
slower by a very tiny amount (assuming the rest of the code is doing
anything interesting, it would be difficult to even measure the
difference...the Dictionary class is still constant order just like your
lookup-array), and your implementation will use a lot more memory than a
dictionary would unless the range of values is completely used or at least
nearly so.

It also won't be flexible if the input data changes for any reason, since
the implementation itself is so dependent on the input data.

Finally, one could use the SortedDictionary class for simplicity. It
would be much slower for the counting part of the algorithm (O(log n)
instead of constant order), but when you're done counting the data would
already be sorted. In the end, I'd be surprised if the total time spent
was significantly different (for sure, the order of the algorithm would be
the same: O(n log n) for all three variations).

"Way faster" isn't a well-defined phrase, but I doubt that any reasonable
interpretation of "way faster" would properly describe the minimal
difference in performance between using a dictionary and your lookup-array
proposal.

In fact, you don't even need the struct
if most of the work is 'reading' the value and the sorting doesn't need
to
be 'high-performance' - then you can define an array of ints and not
structs
and spend a little more time doing the sorting. To decide, we'd need to
understand more about the nature of the data.

Well, personally I would almost never implement a design like the one you
proposed. I certainly would never do it based solely on performance
reasons. If my input data was a reasonably narrow range (say, 16-bit
values or less), _and_ I expected the input values to cover the range with
very few exceptions, then I might choose a solution like the one you
proposed. But mainly because it's simpler, not because it might be a
little faster.

Frankly, when you write stuff like "Being a CF engineer, I'm always
looking to optimize", you are REALLY insulting to those of us who don't
write CF code. As if we aren't concerned with optimization as well.
Furthermore, it's pretty clear from this thread and some others we've seen
that you have a very specific and narrow idea of what "optimize" means,
and if someone doesn't find your choice of trade-offs useful, you dismiss
them as someone who isn't optimizing their code.

The fact is, here you've suggested a solution that isn't really going to
gain any significant performance, and which has its own costs that may not
be appropriate for the situation (and in fact is likely to be
inappropriate for most situations, _especially_ in CF where memory is at a
much greater premium than on a desktop computer).

Calling the dictionary solution "inefficient" is just plain dumb; it might
not be quite as fast as a straight array lookup, but it's not like it's a
_slow_ design. It's not "inefficient" at all. It's just a different way
of approaching the problem and your implication that anyone who might use
a dictionary implementation is writing inefficient code is very annoying.

So, if you're wondering why your posts seem to generate what you think are
"flames", you might take a good look at how you write your "suggestions"
and whether they really qualify as "more efficient" anyway.

Pete



.



Relevant Pages

  • Re: what is the best datatype for..
    ... array that will be scanned searching for the index value, ... It also won't be flexible if the input data changes for any reason, since the implementation itself is so dependent on the input data. ... Calling the dictionary solution "inefficient" is just plain dumb; it might not be quite as fast as a straight array lookup, but it's not like it's a _slow_ design. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Beginner Programmer needs help
    ... >> which reliably gets input data. ... Doing user input in a 'foolproof' way is ... question is what state to leave the array in after such an ESC-press ...
    (comp.lang.cpp)
  • Re: Buffer Overflow. What is it?
    ... ]I saw so many buffer overflow attacks on Windows OS. ... How can such event allows an attacker to put a program on the ... area allocated for that array. ... simply copy all the input data given to the routine into that array. ...
    (comp.security.misc)
  • Re: Re: Re: RE: Perl Question: Optimization
    ... @ids array changes, the statement gets re-prepared, so unless you have the ... # THIS INPUT DATA FILE CONTAINS ABOUT 40000 ROWS ... >prepared statement inside the loop of 40000. ...
    (perl.dbi.users)
  • Re: About Thread Completion Notify.
    ... This is the reason that i must keep track of the threads and they have to ... i remove it from the array. ... > (which you probably don't since you are servicing requests in separate ... you shouldn't have a need for waiting on events. ...
    (microsoft.public.dotnet.languages.csharp)