Re: How to best extract a list of identical keys in a sorted ArrayList with BinarySearch ?

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



On Jun 1, 5:32 pm, "Peter Duniho" <NpOeStPe...@xxxxxxxxxxxxxxxx>
wrote:
On Fri, 01 Jun 2007 16:36:02 -0700, Bruce Wood <brucew...@xxxxxxxxxx>
wrote:

You stopped reading at "SortedList". I'm proposing the classic "list
of lists" solution: the sorted list contains one entry for each key,
where that entry contains all of the items that match that key.

I didn't stop reading. I simply got distracted by the implication that
you could use some sort of hashed access to the SortedList. :)

[...]
I have never looked at the underlying implementation for a SortedList.
If you are correct, then I'm appalled that it does not combine a list
and a hash table. If the indexer on key has to do a binary search then
that's a terrible waste of time. I'd rather trade memory for more
speed, personally.

Well, there's a Hashtable class. I presume that SortedList doesn't do
hashing, because someone who wanted hashing would just use the Hashtable
class instead, or possibly would use the two class together to achieve
hashing with sorting. Hashing and sorting are so fundamentally different
that I just don't see a simple collection class actually implementing
both. Not that it couldn't, just that it would make the interface so
complicated, merging two completely different behaviors into a single
class.

Oh, I don't know. I think the interface would look exactly like
SortedList, except that it would find things faster. :-)

If constant-order searching is desired, then a hashtable-based data
structure needs to be used, like Hashtable or the generic Dictionary<>.

Yes, and if SortedList truly is simply an ordered list with a key
indexer that searches by binary search or some other such method, then
I wouldn't propose it.

From the documentation of SortedList.Item:

Retrieving the value of this property is an O(log n)
operation, where n is Count

Sure looks like a binary search to me. :)

Yup. That's pretty conclusive. I should read the doc more closely,
huh? :-)

In that case the OP should use a hash table and, if ordered access is
required for some reason, either maintain a parallel sorted list or
sort it on an as-needed basis.

Well, IMHO it really depends on the performance needs. A binary search is
actually pretty fast, even on very large data. The main problem with it
is that it requires that the data be kept in sorted order in the first
place, which can be expensive. But if you already have the requirement
that the data be sorted, I see no need to switch to hashing to find data.
Even if you assume that the binary search is slower than a lookup by hash
value, it's not going to be *much* slower (certainly not an order of
magnitude or anything like that).

Agreed.

In addition, hashing of course has the trade-off of memory requirements
versus collisions. With any large data set, unless you have a really huge
hash table, you're going to have a fair number of collisions, which of
course you have to search linearly through before getting the item you
really want. That linear search can easily consume the same time as a
simple binary search on sorted data would, depending on the number of
collisions for that hashed value.

Hashing works well when the data isn't sorted in the first place and you
have no other need to sort it, but if the data needs to be sorted anyway,
there's really no reason to not just use a binary search.

Agreed.

I had just assumed that SortedList provided the fastest access
possible but, as you pointed out, the gain in speed wouldn't be that
much, and the amount of memory consumed wouldn't be worth it.

So, we agree: the OP should use a Hashtable (or Dictionary<>) if
sorting isn't important, and a SortedList if sorting is required.

.



Relevant Pages

  • Re: How to best extract a list of identical keys in a sorted ArrayList with BinarySearch ?
    ... where that entry contains all of the items that match that key. ... If the indexer on key has to do a binary search then ... I presume that SortedList doesn't do hashing, because someone who wanted hashing would just use the Hashtable class instead, or possibly would use the two class together to achieve hashing with sorting. ... Even if you assume that the binary search is slower than a lookup by hash value, it's not going to be *much* slower. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: fast dictionary search algorithm
    ... > I have a text file with words and meanings. ... The algorithm ... > I have considered hashing and multiple level hashing as an option ... and use binary search on them from there. ...
    (comp.programming)
  • Re: Choice of data structures
    ... > probably use some generic sequence data type, ... I'm wondering if you need the hashing. ... random-access operation, then sure, hashing is necessary. ... You could store the time in each piece of data, then use binary search ...
    (comp.lang.lisp)
  • Re: fast dictionary search algorithm
    ... >> I have considered hashing and multiple level hashing as an option ... > don't go the route of rehashing. ... and use binary search on them from there. ... A reasonable system will never ...
    (comp.programming)
  • Re: printing nodes of a bst using the min heap property?
    ... >) Sorting takes O. ... >You can sort an array by inserting all the items into a binary search ... >tree, and then traversing the tree. ...
    (comp.programming)