Re: Reading text files using pointers?

From: Einar Buffer (_ebuffer__at_hotmail.com)
Date: 02/12/04


Date: Thu, 12 Feb 2004 16:24:24 +0100


"Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in message
news:MPG.1a95954ea390938398a102@msnews.microsoft.com...
> Einar H?st <anonymous@discussions.microsoft.com> wrote:
> > You see, I'm a bit of a curious newbie. I don't really have any
> > performance problems, I just want to try tweaking my routine in order
> > to learn more about performance and .NET. Its a learning thing more
> > than a necessity. And one of the things I learn from is good questions
> > that make me think :-)
>
> While in general I applaud such sentiments (and I like tweaking with
> things myself) I would recommend avoiding unsafe code until you
> *really* need it. I haven't even *looked* at it myself, on the grounds
> that I can't see myself needing it and if I don't actually know it,
> I'll be less tempted to start using it where I don't really need it.

Indeed, I think it will be a while before I include it in any of my
professional work - if ever. As it turns out, the unsafe approach was
slightly faster (5-10%) than the typesafe one when I did no other
processing, just looked for dollars. However, once I started doing other
stuff - even with the exact same code - it all evened out. Perhaps some side
effect of having pinned memory for a prolonged time?

> > Regarding the encoding of the file, I'm not really sure what it is,
> > but it seems that the number of bytes in the file correspond to the
> > number of characters. Is there any check I can do to determine the
> > encoding precisely?
>
> Not really - it could be any number of encodings. What's producing the
> file in the first place?

I guess I could find out - it's a data logging program written by a
co-worker. It's reading data from the serial port and persisting it to file.
It's written in C++... I'd guess the guy who wrote it used some default
value in the win32 API if possible.

> > I don't know if BinaryReader is any faster than StreamReader. The move
> > from StreamReader to BinaryReader was basically done because I wanted
> > to do without the ReadLine method.
>
> Well, you can use StreamReader without using ReadLine. You can read a
> character at a time, or a block of characters.

Yeah, I guess you're right - still, C# characters are 16 bit, right? In
general, would there be any performance differences if the two classes are
used for the same task, I wonder?

> > Creating my own byte-buffer and searching for
> > dollar signs has reduced the allocation amount to approximately 6.5MB,
> > of which 3.5MB is bytes and 2.0MB is strings. The number of garbage
> > collections went down from 44 to 4! I've doubled the speed of my
> > routine. To try to squeeze out a little more, I decided to experiment
> > with unsafe code, but so far, I haven't got much effect out of that. I
> > guess the benefits are small compared to the other processing I'm
> > doing in my routine...
>
> To find the dollars, you could read chunks in at a time (e.g. 16K
> chars) into a fixed buffer, and search within that buffer. When you've
> found the appropriate dollar, read the rest of that buffer and then all
> the buffers after that (or whatever).
>

Indeed, this is approximately what I do - I read 32K bytes, scan for
dollars, check the message type, skip some bytes if its one of the five I
don't want, parse it otherwise. If I need some extra bytes to figure out the
message type or message content, I read the amount I need from the stream.

Thanks again!



Relevant Pages

  • Re: Reading text files using pointers?
    ... > performance problems, I just want to try tweaking my routine in order ... > Regarding the encoding of the file, I'm not really sure what it is, ... or a block of characters. ... chars) into a fixed buffer, ...
    (microsoft.public.dotnet.languages.csharp)
  • [UNIX] wu-ftpd fb_realpath() Off-by-One Bug
    ... Wu-ftpd FTP server contains remotely exploitable off-by-one bug. ... characters while the size of the buffer is MAXPATHLEN characters only. ... Following FTP commands may be used to cause buffer overflow: ...
    (Securiteam)
  • Re: Serial Port CE_OVERRUN errors
    ... SerialNG component. ... The main Input buffer is set to 32000, and I have a variable which is ... updated with the maximum number of characters waiting there each time the ... I tried boosting the baud rate from 57.6k to 115.2k, ...
    (comp.lang.pascal.delphi.misc)
  • Re: WaitCommEvent and ResetEvent for SerialPort
    ... This is extremely true when you say you are exhausting the input buffer, ... This receiver thread will then signal your main ... >> From the WaitCommEvent documentation, ... Since no characters are currently coming it, ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Copy to CStrings internal buffer
    ... I know that this buffer is 20 characters long; ... conditions would I use memcpy, because a CString is not necessarily 8-bit characters! ...
    (microsoft.public.vc.mfc)