Re: Comment on how to uniquely track your objects in C# / hash table / get hash code



On Aug 5, 1:08 am, Pavel Minaev <int...@xxxxxxxxx> wrote:
On Aug 5, 2:25 am, raylopez99 <raylope...@xxxxxxxxx> wrote:

A quick sanity check, and I think I am correct, but just to make
sure:  if you have a bunch of objects that are very much like one
another you can uniquely track them simply by using an ArrayList or
Array, correct?

You can uniquely track an instance of any reference type, no matter
what data is inside, because it has identity, which can be compared
against by using Object.ReferenceEquals(). It is this identity that
serves as a basis for default implementations of GetHashCode() and
Equals(), and more often than not it is good enough.


Thanks, I did not think about Object.ReferenceEquals(). I was
concentrating on using a generic List and writing my own
"Equals" (override) {public override bool Equals(Object obj) }. But I
like .ReferenceEquals since it's a library function and I trust code
written by librarians--I just hobby code, so, like a beginner chess
player who memorizes grandmaster moves, I prefer to use professional
code whenever possible.


But, as pointed out in the links below, hash functions can be non-
unique, or give collisions with two different objects having the same
hash (rare, but happens),

This is largely irrelevant to the issue of performance, since hash
tables work correctly even with collisions - just slower. You're still
guaranteed to have O(1) on average on random input, though
pathological cases can become O(n).

Yes, hash tables may work correctly with collisions, but see my
philosophical thread to Arne--you can have problems if a collision
occurs in a Array or List when you are trying to find a 'unique'
object that is not so unique.


Maintaining a hash table also has some additional overhead, both on
insertion and on retrieval (computing hashes, balancing buckets etc).
In practice, particularly for small collections (roughly ~100 items),
where both insertions and lookups happen frequently, a plain list with
linear search can be more efficient than a hashtable both in terms of
speed and memory.

I didn't know the small collections cutoff was that small! I thought
it would be closer to 10k items. It shows the librarians who wrote
the code for hash table did a very good job, to get the number that
low.


For larger stuff, hashtable is worth considering -
though even then, if inserts are frequent and lookups are very rare, a
list is probably still better (but it is not a usual case).


I wish I had a table to consult to see what container to use for what
type situation... probably there's a table somewhere on the net.


On the other hand, if you fill the collection once, and then just do
lookups, never modifying it, then the most efficient implementation is
pre-sorting the collection and doing binary search on it
(Array.BinarySearch or List<T>.BinarySearch, as needed - but if a
collection is immutable, might as well make it an array for the slight
performance increase that gives).

Very interesting. I'll keep this in mind-- use binary tree for
objects you will not move around much, which is akin to what I heard
about fast lookups for balanced red/black binary trees. But I think
the philosophical question I posed to Arne in this thread will also
apply to binary trees. Probably it's a problem you cannot escape
from.

RL
.



Relevant Pages

  • Re: hash tables, non-random keys
    ... I have a hash table that will be containing a non-uniform distribution ... The keys will be inserted sequentially, ... would just use an array. ... something I also sometimes do is to use a small hash to optimize lookups ...
    (comp.programming)
  • Re: Comment on how to uniquely track your objects in C# / hash table / get hash code
    ... Array, correct? ... You can uniquely track an instance of any reference type, ... This is largely irrelevant to the issue of performance, since hash ... where both insertions and lookups happen frequently, ...
    (microsoft.public.dotnet.languages.csharp)
  • Duck Typed Concepts for Ruby (was Re: A use case for an ordered hash)
    ... An Sequencable mixin can be defined that implements all sorts of operations such as append, concat, splice, sort, etc. ... extending an instance of Array with Sorted if the array is known to be sorted. ... Now returning to the thread at hand we can see that the difference between the associative array and hash hierarchies is that the hash hierarchy depends upon Hashable keys. ...
    (comp.lang.ruby)
  • Re: Suggestions for double-hashing scheme
    ... >> The items that are being moved are the items in the hash table itself, ... >> which are of fixed size (they are in an array, ... > typedef struct { ... One "uchar" aka 'unsigned char' is plenty to hold a probe ...
    (comp.programming)
  • [SUMMARY] Index and Query (#54)
    ... This was a fun quiz for me because I really don't know anything about indexing ... We see in initializethat the index is just a Hash. ... an Array of symbolic document names where the word can be found ). ...
    (comp.lang.ruby)