Re: Compare Instr() and IndexOf() performance



Hi, Christopher
The largest file is 780 kb, the format of the file is fixed length record
(17 characters), I am always searching this files on first 10 characters
(user can't modify this file, but the contents of this file changes every
day). My approach for searching is to read this file into one string and then
perform search with string1.IndexOf(string2) if string2 found on file then i
return result with string1.SubString(pos, 17).
How can i create Index like search engine? Do you have any references for
this approach?
I appreciate your help.

"elena" wrote:

Hi, Christopher
The format of my files are fixed length record: the largest file has 17
bytes per record, and only one file which never exceeds more then 150 records
is sequensial file.
Please, any advice

"Christopher Fairbairn [MVP]" wrote:

Hi,

"elena" <elena@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:B77129EB-2D66-4DB3-9ADF-44A758628561@xxxxxxxxxxxxxxxx
It takes about up to 15 seconds to search files if i use str1.IndexOf(
str2)
and then str1.Substring(). Searched files are not that large at
all file1 - 780 kb, file2 - 340 kb, file3 - 125 kb

I doubt you will be able to get acceptable performance from an end user
perspective with this approach, especially if the size of those files grow
over time. Atleast without putting a "please wait" type message in place.

If you just calculate the maximum number of string comparisions that may be
required to see if an arbitary string is found anywhere within 1.2MB of data
you will notice that even that number is quite high. Especially if the
search operation occurs multiple times.

What is the structure of your 3 files? Are they free form text or is there
some kind of structure to them?

10 - 15 seconds become long wait for the users. How can i
improve/speed up the search?

How often do the files change? Are they fairly static reference data or
something the user changes frequently?

One possible technique could be to produce an index (kind of like a search
engine would) of possible search terms and their known locations within the
source files.

This way you can look up the index and find out where that term can be found
within your original text file(s).

Hope this helps,
Christopher Fairbairn



.



Relevant Pages

  • Re: How many bytes per Italian character?
    ... to format an ANSI string, and yes, there's a possibility that it isn't implemented on CE, ... you went from a discussion about the number of bytes in Italian characters to a ... Remote Registry Editor rather than in Windows CE. ...
    (microsoft.public.vc.mfc)
  • Re: String parsing question
    ... >My task is to convert items in the third format to the first format, ... >in the string, which may or may not have a trailing semicolon. ... While I could count semicolons easily with strchr, ... your characters really are alphanums, ...
    (comp.lang.c)
  • Re: Long Integer truncates 0 at beginning of number
    ... a human in that format. ... If you want to change it to a string: ... characters, then it is likely NOT a number (something you'd ... Is there anything I can do to keep the field type long? ...
    (microsoft.public.access.gettingstarted)
  • Re: converting a string to a list
    ... format of the input data, ... languages is that you can treat the compiler as a black box, ... I input a string, the ... string is a sequence of characters. ...
    (comp.lang.lisp)
  • Re: Is this good use of Properties?
    ... Should Custom Collections expose them? ... Should your domain object expose characters of its used strings? ... >Assuming some class external to string needed to know it's length How would you get it? ... >I need to send it across the pipe to another process (I don't control the format).. ...
    (microsoft.public.dotnet.languages.csharp)