Re: How to best extract a list of identical keys in a sorted ArrayList with BinarySearch ?



On Fri, 01 Jun 2007 16:36:02 -0700, Bruce Wood <brucewood@xxxxxxxxxx> wrote:

You stopped reading at "SortedList". I'm proposing the classic "list
of lists" solution: the sorted list contains one entry for each key,
where that entry contains all of the items that match that key.

I didn't stop reading. I simply got distracted by the implication that you could use some sort of hashed access to the SortedList. :)

[...]
I have never looked at the underlying implementation for a SortedList.
If you are correct, then I'm appalled that it does not combine a list
and a hash table. If the indexer on key has to do a binary search then
that's a terrible waste of time. I'd rather trade memory for more
speed, personally.

Well, there's a Hashtable class. I presume that SortedList doesn't do hashing, because someone who wanted hashing would just use the Hashtable class instead, or possibly would use the two class together to achieve hashing with sorting. Hashing and sorting are so fundamentally different that I just don't see a simple collection class actually implementing both. Not that it couldn't, just that it would make the interface so complicated, merging two completely different behaviors into a single class.

[...]
If constant-order searching is desired, then a hashtable-based data
structure needs to be used, like Hashtable or the generic Dictionary<>.

Yes, and if SortedList truly is simply an ordered list with a key
indexer that searches by binary search or some other such method, then
I wouldn't propose it.

From the documentation of SortedList.Item:

Retrieving the value of this property is an O(log n)
operation, where n is Count

Sure looks like a binary search to me. :)

In that case the OP should use a hash table and, if ordered access is
required for some reason, either maintain a parallel sorted list or
sort it on an as-needed basis.

Well, IMHO it really depends on the performance needs. A binary search is actually pretty fast, even on very large data. The main problem with it is that it requires that the data be kept in sorted order in the first place, which can be expensive. But if you already have the requirement that the data be sorted, I see no need to switch to hashing to find data. Even if you assume that the binary search is slower than a lookup by hash value, it's not going to be *much* slower (certainly not an order of magnitude or anything like that).

In addition, hashing of course has the trade-off of memory requirements versus collisions. With any large data set, unless you have a really huge hash table, you're going to have a fair number of collisions, which of course you have to search linearly through before getting the item you really want. That linear search can easily consume the same time as a simple binary search on sorted data would, depending on the number of collisions for that hashed value.

Hashing works well when the data isn't sorted in the first place and you have no other need to sort it, but if the data needs to be sorted anyway, there's really no reason to not just use a binary search.

I still can't believe that SortedList doesn't use a dual structure
internally. Yuck.

Well, it does. It has an index array and the value array. :) It's just that neither of those data structures are a hash table. :)

Pete
.



Relevant Pages

  • Re: How to best extract a list of identical keys in a sorted ArrayList with BinarySearch ?
    ... where that entry contains all of the items that match that key. ... If the indexer on key has to do a binary search then ... hashing, because someone who wanted hashing would just use the Hashtable ... Hashing and sorting are so fundamentally different ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Choice of data structures
    ... > probably use some generic sequence data type, ... I'm wondering if you need the hashing. ... random-access operation, then sure, hashing is necessary. ... You could store the time in each piece of data, then use binary search ...
    (comp.lang.lisp)
  • Re: fast dictionary search algorithm
    ... > I have a text file with words and meanings. ... The algorithm ... > I have considered hashing and multiple level hashing as an option ... and use binary search on them from there. ...
    (comp.programming)
  • Re: fast dictionary search algorithm
    ... >> I have considered hashing and multiple level hashing as an option ... > don't go the route of rehashing. ... and use binary search on them from there. ... A reasonable system will never ...
    (comp.programming)
  • Re: OT: Binary Search - Should They Know It?
    ... Patterns of Enterprise Architecture ... A graduate from a university degree program absolutely should be familiar with basic data structures such as a hash table, or an algorithm like a binary search. ... In the end, what a company needs to do is figure out what they are hiring for, what skills that person needs, and figure out a way to measure those skills in a reliable way. ...
    (microsoft.public.dotnet.languages.csharp)