Re: Which data structure should Iuse for fast retrieval

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Jase (jshelley_at_spamblock.enersol.com.au)
Date: 04/16/04


Date: Fri, 16 Apr 2004 15:05:09 +1000


"YKUT" <ykutanoor2003@yahoo.com> wrote in message
news:e75a53c8.0404142055.76afcaf2@posting.google.com...
> Hi,
> Which is the most preffered & fastest way of retrievals /searching
> mechanism wrt Data stuctutes/hash tables ?
>
> Is CList,CMap,CMapString ?
>
> Pls clarify as I am doing frequent searches in a file.My idea is read
> from this file and dump this in to one data structure.
>
> Pls guide.
>
> Regards
> YKUT

As others have mentioned, without knowing how much data you have, there is
no simple answer. However, if you want to find strings in a file, by far the
quickest way is to build a suffix tree. The tree construction can be done in
O(n) time (linear with respect to file size). The children of the root node
are all the different characters in the file, so searching for a specific
string involves finding a starting point (max 256 compares) then following
the links. That means that no matter what the file size, the maximum number
of compares to find a given string is 256 * string length (pathological
worst case for a random binary file). Given that most letters will not have
the entire ascii alphabet as children, the usual case is very much better.
It is a bit of a memory hog, but VERY fast for searching.

There are heaps of resources for suffix tree construction on the web. Just
do a bit of googling.

Jase



Relevant Pages

  • [SUMMARY] Longest Repeated Substring (#153)
    ... string size, try to find a string of that length that occurs twice in the full ... longest repeated substring is typically less than 200 bytes. ... The idea of a suffix tree is to build a list of every suffix that occurs in the ... if distance < at_least_size ...
    (comp.lang.ruby)
  • Re: Regular Expressions: large amount of ors
    ... :> certain predefined words in that string. ... The internal data structure that encodes that set of keywords is ... doing mutiple keyword matching; I had an older module that wrapped ... around a suffix tree: ...
    (comp.lang.python)
  • Re: Big O and algorithm to decide string A contains string B?
    ... > One simple possibility is to iterate through each position i within string ... Way too slow if your pattern is 1k chars long and your text is 5 million ... suffix array construction, which is. ... O, I think, for suffix tree ...
    (comp.programming)
  • Re: Big O and algorithm to decide string A contains string B?
    ... > One simple possibility is to iterate through each position i within string ... Way too slow if your pattern is 1k chars long and your text is 5 million ... suffix array construction, which is. ... O, I think, for suffix tree ...
    (comp.lang.cpp)