Re: Composite Keys with <TKey,TValue> pairs generic collections

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Jon Skeet [C# MVP] wrote:
Jeroen Mostert <jmostert@xxxxxxxxx> wrote:
Yes, they do. I only remember offhand because the hash algorithm *used* to be simple XOR-ing, which is horrible. It's better now.
>>>
Why is that horrible, and how is it better? Assuming the hash functions of the individual members do their jobs properly and generate nicely distributed hash codes, how would XOR'ing them be any worse than another operation? Conversely, if the hash functions of the individual members *don't* produce nicely distributed hash codes, what sort of algorithm would minimize the damage for the composite hash code?
>>

It's quite common to store the same type for both members of a pair. At that point, XOR gives the unpleasant behaviour. Consider a map with the following pairs of strings in:

{"hello", "hello"} => hash of 0, XOR of x and x is always 0
{"there", "there"} => hash of 0 again
{"a", "a"} => and again
{"a", "b"} => some value k
{"b", "a"} => k again

Values don't tend to be uniformly distributed - values (x, y) and (y, x) are likely to come up in the same map in various scenarios, as are various (x, x) values.

Nice. I hadn't considered that -- I haven't had too many scenarios yet where such collisions could occur, although I've had many where the types were equal.

This could be an common scenario if you're handling coordinates of some form, though. If you had need to hash those for some purpose, a simple XOR'ing of values could be disastrous.

Personally I tend to use something like:

int hash = 17;
hash = 23*hash + firstField.GetHashCode();
hash = 23*hash + secondField.GetHashCode();
hash = 23*hash + thirdField.GetHashCode();

etc

17 and 23 are arbitrary coprime numbers here.

Good. Don't forget to wrap that in an unchecked { }, though. :-)

--
J.
.



Relevant Pages