Re: optimizing file i/o
- From: Jon Skeet [C# MVP] <skeet@xxxxxxxxx>
- Date: Sun, 17 Apr 2005 06:45:51 +0100
Michael Powe <michael+tbird@xxxxxxxxxxxx> wrote:
> I have written a small app to parse web log files and extract certain
> lines to another file. There is also functionality to count all the
> items that are being filtered out.
>
> I wrote this in c# instead of in perl because the log files are 3-4GB
> and I want faster processing than perl would typically provide. And,
> I'm learning c#.
>
> There are two issues I would like to address: improve the speed of the
> file i/o and control the processing. Right now, this app takes about 20
> min to process a 3GB file on a laptop with a 2Ghz proc and 2GB RAM.
> Processing is implementing a method that both filters and counts. Also,
> it pegs my CPU while it's running.
>
> Below are the filtering and filtering/counting methods.
If the CPU is pegged (which I can understand, given the code), then the
I/O speed isn't the problem.
Some suggestions:
1) Don't create the regular expressions freshly each time. I don't know
whether you've got a lot of small files or just a few big ones, but it
would make more sense to create them once, as you don't need to change
them.
2) Use the option to compile the regular expressions when you create
them. This could improve things enormously.
3) Rather than using a hashtable, consider having an array of ints
along with your array of regular expressions. You could then iterate
through the regular expression array by index rather than by value, and
just increment the relevant int - no hashtable lookup, no unboxing and
then reboxing.
4) If most lines in the file will match one of the filters, try getting
rid of the "all" regular expression, working out the result just by
running all the others. It may not help, but it's worth a try.
Finally, use using statements for your stream readers and writers -
that way, if an exception is thrown, you'll still close the file
immediately.
--
Jon Skeet - <skeet@xxxxxxxxx>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
.
- Follow-Ups:
- Re: optimizing file i/o
- From: Michael Powe
- Re: optimizing file i/o
- References:
- optimizing file i/o
- From: Michael Powe
- optimizing file i/o
- Prev by Date: Re: File Access Error...More
- Next by Date: Re: OOP question --- theoretically speaking....
- Previous by thread: optimizing file i/o
- Next by thread: Re: optimizing file i/o
- Index(es):
Relevant Pages
|