Re: Reading text files using pointers?
From: Jon Skeet [C# MVP] (skeet_at_pobox.com)
Date: 02/12/04
- Next message: Jon Skeet [C# MVP]: "Re: StreamReader Error -he path is too long"
- Previous message: Ignacio Machin \( .NET/ C# MVP \): "Re: StreamReader Error -he path is too long"
- In reply to: Einar Høst: "Re: Reading text files using pointers?"
- Next in thread: Einar Buffer: "Re: Reading text files using pointers?"
- Reply: Einar Buffer: "Re: Reading text files using pointers?"
- Messages sorted by: [ date ] [ thread ]
Date: Thu, 12 Feb 2004 13:51:43 -0000
Einar H?st <anonymous@discussions.microsoft.com> wrote:
> You see, I'm a bit of a curious newbie. I don't really have any
> performance problems, I just want to try tweaking my routine in order
> to learn more about performance and .NET. Its a learning thing more
> than a necessity. And one of the things I learn from is good questions
> that make me think :-)
While in general I applaud such sentiments (and I like tweaking with
things myself) I would recommend avoiding unsafe code until you
*really* need it. I haven't even *looked* at it myself, on the grounds
that I can't see myself needing it and if I don't actually know it,
I'll be less tempted to start using it where I don't really need it.
> Regarding the encoding of the file, I'm not really sure what it is,
> but it seems that the number of bytes in the file correspond to the
> number of characters. Is there any check I can do to determine the
> encoding precisely?
Not really - it could be any number of encodings. What's producing the
file in the first place?
> I don't know if BinaryReader is any faster than StreamReader. The move
> from StreamReader to BinaryReader was basically done because I wanted
> to do without the ReadLine method.
Well, you can use StreamReader without using ReadLine. You can read a
character at a time, or a block of characters.
> You see, I'm reading this file
> containing messages starting with a dollar sign. I just want one of
> six messages, and so I thought I'd create less garbage by avoiding to
> create strings for the unwanted messages. I was examining the app in
> CLR profiler, and found I allocated 16MB to read a file of3.5MB, of
> which 8.5MB was strings.
That sounds about right, yes, assuming the whole thing was being loaded
at a time.
> Creating my own byte-buffer and searching for
> dollar signs has reduced the allocation amount to approximately 6.5MB,
> of which 3.5MB is bytes and 2.0MB is strings. The number of garbage
> collections went down from 44 to 4! I've doubled the speed of my
> routine. To try to squeeze out a little more, I decided to experiment
> with unsafe code, but so far, I haven't got much effect out of that. I
> guess the benefits are small compared to the other processing I'm
> doing in my routine...
To find the dollars, you could read chunks in at a time (e.g. 16K
chars) into a fixed buffer, and search within that buffer. When you've
found the appropriate dollar, read the rest of that buffer and then all
the buffers after that (or whatever).
-- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
- Next message: Jon Skeet [C# MVP]: "Re: StreamReader Error -he path is too long"
- Previous message: Ignacio Machin \( .NET/ C# MVP \): "Re: StreamReader Error -he path is too long"
- In reply to: Einar Høst: "Re: Reading text files using pointers?"
- Next in thread: Einar Buffer: "Re: Reading text files using pointers?"
- Reply: Einar Buffer: "Re: Reading text files using pointers?"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|