Re: what is the best datatype for..



On 2007-11-19 12:38:47 -0800, "Hilton" <nospam@xxxxxxxxxx> said:

Hi,

Being a CF engineer, I'm always looking to optimize, so even though hashes,
dictionaries etc will work, the whole process is inefficient.

Define "inefficient".

A dictionary implementation will be faster than your suggested implementation.

Do you have a
range of values? Let's say that you know that the values will be 0-1000,
define a struct with two values; one being the value, the other being the
count. Create an array of the structs and off you go. Alternatively you
could use two separate arrays.

If I understand your suggestion correctly, you are proposing creating an array that will be scanned searching for the index value, at which point the count can be updated.

For small data sets, I think that would work fine. Nothing wrong at all. But claiming that it's more "efficient" begs the question: in what way?

It's more efficient in memory usage, but it's certainly not more efficient with respect to performance. Furthermore, for large data sets where the difference in memory footprint is likely to be of concern, the performance difference will be phenomenal. The dictionary implementation will always provide a basically constant order retrieval of the element in question, while your linear search will be linear order, taking orders of magnitude longer as the data set is also orders of magnitude larger.

In other words, the performance for just 100 elements is ten times slower than the performance for just 10 elements, and for 1000 elements is another ten times slower than for 100 elements.

Again, this is for performance reasons and I'll bet it'll be way faster than
dictionaries etc.

You'd bet wrong. Your suggestion is more memory efficient, but it's not going to be faster.

Of course, only do this extra work if you need the
performance and you're not just running a one-off process.

No, only do this extra work if you absolutely must minimize the memory footprint of the implementation. It's not going to perform better at all.

I get this feeling of pending flames - it always seems to happen when I
suggest more efficient techniques. :)

Well, a) you seem to misunderstand how the dictionary solution would work, and b) you do seem to insist on mischaracterizing techniques as "efficient" when they either are not at all, or are not efficient in a way that's is interesting in the more general case (this is not a newsgroup for writing using Compact Framework).

It is true that there are often trade-offs between being memory efficient and computation efficient. It's false for you to assert that you know best which is more important. Even on a CF-based implementation, sometimes you'll still prefer a faster implementation over one that uses less memory. It just depends on the priorities of the situation.

Pete

.



Relevant Pages

  • "Parallel.For GC problems" and a solution.
    ... running same thing in parallel, Gen 2 bytes go way up, and LOH usage goes way up. ... with the dictionaries holding arrays of keys, ... do not pass it a large array of things to process. ... With no breaks, memory goes crazy. ...
    (microsoft.public.dotnet.languages.vb)
  • Re: "Parallel.For GC problems" and a solution.
    ... I think Dictionary(of dictionary(of small array of ints)) ... with the dictionaries holding arrays of keys, ... do not pass it a large array of things to process. ... With no breaks, memory goes crazy. ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Writing huge Sets() to disk
    ... >> state and taking say 600MB, pushes it's internal dictionaries ... that this code takes a lot of extra memory. ... >> I believe it's the references problem, ... the swapspace reserved grows during that posted loop. ...
    (comp.lang.python)
  • Re: Ordered dict by default
    ... because the memory used increases, so there are more cache misses. ... of insertions then traversals with my dictionaries: ... So a slight decrease in insertion speed at the cost of faster traversal seems ...
    (comp.lang.python)
  • Re: Writing huge Sets() to disk
    ... > references in python. ... > state and taking say 600MB, pushes it's internal dictionaries ... Almost anything you do copies references. ... that this code takes a lot of extra memory. ...
    (comp.lang.python)