Re: How to search files for text string most efficiently?

Tech-Archive recommends: Fix windows errors by optimizing your registry

From: Richard Jalbert (richmann_at_sympatico.ca)
Date: 10/29/04


Date: Fri, 29 Oct 2004 01:43:14 GMT

On Thu, 28 Oct 2004 20:52:17 +0200, "Herfried K. Wagner [MVP]"
<hirf-spam-me-here@gmx.at> wrote:

>"Richard Jalbert" <richmann@sympatico.ca> schrieb:
>>>I am working on a small windows application for a client, and as one of
>>>the
>>>functions they want a search that will let them enter a search string,
>>>then
>>>search a directory for all flies that contain that search string AND
>>>display
>>>the lines that contain the search string.
>> [...]
>> Then using that array, get each filename, get and read by OPEN AS
>> BINARY each file, loading its content in a single one string buffer,
>> sized after getting the file size. on which you could do a "instr"
>> function.
>
>If the files are "small", that's a good approach.

Is not the maximum size for a string buffer something like 0 to 2
billion characters?

>If the files are large,

What would be a large file?
I have one that is 214 Megs (PI to a million place and I cannot open
it on my machine (I concaneted it from 20 smaller files))

>it's trickier, you'll have to read the file in chunks of a certain size and
>then perform 'InStr', notice that you will have to check for occurances that
>overlap the ends of two chunks separately.

Overlap is easily checked by reading the first buffer then when
reading the second, back the byte pointer by at least the size of the
substring to be found.

One detail that was not stated: what is the substring is split by a
vbCRLF character. this mean they would have to be removed from the
file before doing the search, no ?

**********************************************************************
Richard Jalbert Programmer-Analyst Richmann@sympatico.ca

Dogs have owners, cats have staff.

http://www3.sympatico.ca/richmann/
**********************************************************************



Relevant Pages

  • Re: Parsing Multipart formdata
    ... Read the header, ... Start reading the part header, ... Stop reading into the string buffer when you see the boundary again. ... The array of bytes is your binary data. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: impl a collection (watch out for big strings)
    ... Go look at the code for substring. ... > The original statement was generalised - the topic of this forum is ... > Java so it was assumed that the statement was related to Java, ... Now this is with string buffer. ...
    (comp.lang.java.programmer)