Re: Reading text files using pointers?

From: Jon Skeet [C# MVP] (skeet_at_pobox.com)
Date: 02/12/04


Date: Thu, 12 Feb 2004 13:51:43 -0000

Einar H?st <anonymous@discussions.microsoft.com> wrote:
> You see, I'm a bit of a curious newbie. I don't really have any
> performance problems, I just want to try tweaking my routine in order
> to learn more about performance and .NET. Its a learning thing more
> than a necessity. And one of the things I learn from is good questions
> that make me think :-)

While in general I applaud such sentiments (and I like tweaking with
things myself) I would recommend avoiding unsafe code until you
*really* need it. I haven't even *looked* at it myself, on the grounds
that I can't see myself needing it and if I don't actually know it,
I'll be less tempted to start using it where I don't really need it.
 
> Regarding the encoding of the file, I'm not really sure what it is,
> but it seems that the number of bytes in the file correspond to the
> number of characters. Is there any check I can do to determine the
> encoding precisely?

Not really - it could be any number of encodings. What's producing the
file in the first place?

> I don't know if BinaryReader is any faster than StreamReader. The move
> from StreamReader to BinaryReader was basically done because I wanted
> to do without the ReadLine method.

Well, you can use StreamReader without using ReadLine. You can read a
character at a time, or a block of characters.

> You see, I'm reading this file
> containing messages starting with a dollar sign. I just want one of
> six messages, and so I thought I'd create less garbage by avoiding to
> create strings for the unwanted messages. I was examining the app in
> CLR profiler, and found I allocated 16MB to read a file of3.5MB, of
> which 8.5MB was strings.

That sounds about right, yes, assuming the whole thing was being loaded
at a time.

> Creating my own byte-buffer and searching for
> dollar signs has reduced the allocation amount to approximately 6.5MB,
> of which 3.5MB is bytes and 2.0MB is strings. The number of garbage
> collections went down from 44 to 4! I've doubled the speed of my
> routine. To try to squeeze out a little more, I decided to experiment
> with unsafe code, but so far, I haven't got much effect out of that. I
> guess the benefits are small compared to the other processing I'm
> doing in my routine...

To find the dollars, you could read chunks in at a time (e.g. 16K
chars) into a fixed buffer, and search within that buffer. When you've
found the appropriate dollar, read the rest of that buffer and then all
the buffers after that (or whatever).

-- 
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too


Relevant Pages

  • Re: Reading text files using pointers?
    ... >> performance problems, I just want to try tweaking my routine in order ... or a block of characters. ... > chars) into a fixed buffer, ...
    (microsoft.public.dotnet.languages.csharp)
  • [UNIX] wu-ftpd fb_realpath() Off-by-One Bug
    ... Wu-ftpd FTP server contains remotely exploitable off-by-one bug. ... characters while the size of the buffer is MAXPATHLEN characters only. ... Following FTP commands may be used to cause buffer overflow: ...
    (Securiteam)
  • Re: Serial Port CE_OVERRUN errors
    ... SerialNG component. ... The main Input buffer is set to 32000, and I have a variable which is ... updated with the maximum number of characters waiting there each time the ... I tried boosting the baud rate from 57.6k to 115.2k, ...
    (comp.lang.pascal.delphi.misc)
  • Re: WaitCommEvent and ResetEvent for SerialPort
    ... This is extremely true when you say you are exhausting the input buffer, ... This receiver thread will then signal your main ... >> From the WaitCommEvent documentation, ... Since no characters are currently coming it, ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Copy to CStrings internal buffer
    ... I know that this buffer is 20 characters long; ... conditions would I use memcpy, because a CString is not necessarily 8-bit characters! ...
    (microsoft.public.vc.mfc)