Re: Reading an Ascii string
- From: Göran Andersson <guffa@xxxxxxxxx>
- Date: Sat, 08 Jul 2006 22:27:11 +0200
Here you can see the most common DOS code pages:
http://en.wikipedia.org/wiki/Codepage
Codepage 850 is the most likely for a brittish computer.
John wrote:
Thanks for all your replies..
Just to clarify...
"ASCII is 7-bit, and any character above 127 will need the proper encoding to be read right. I'm assuming the characters are stored as 8 bit."
Sorry for being inprecise. When said "Ascii (ie 8-bit) characters", I meant that they are stored as bytes rather than 16-bit quantities as unicode requires. The Ascii characters are all 7-bit from a quick glance but really I ought to write a quick test programme to check this, which would find and flag up non-ascii characters. Then I could try to deduce what the encoding is. The original DOS-based [stock control] programme that created the file was by a UK company and this software was used on a PC in the UK to generate the files. Is there a standard type of encoding for the UK? Is this likely to be extended ASCII, for which Göran Andersson suggested using Encoding.GetEncoding(850)? Thanks Jon Skeet for your comment about extended Ascii. I suppose that there could potentially be things like a letter e with an acute accent, and I don't want to mangle these. Once I've discovered the encoding, I'm pretty certain that it will be consistent across all the files.
Göran Andersson: "Does the length include the padding or not?"
It does include padding, so a string with three characters appears as byte 0x14 (ie 20) followed by the three characters followed by 17 space characters. I will trim the string.
"John" <-> wrote in message news:OMR74pooGHA.3304@xxxxxxxxxxxxxxxxxxxxxxx
Hi,
I'm a beginner is using C# and .net.
I have big legacy files that stores various values (ints, bytes, strings) and want to read them into
a C# programme so that I can store them in a database. The files are written by a late 1980's PC
Pascal programme, for which I don't have the source code. I've managed to reverse engineer the file
format.
The strings are stored as Ascii in the file, with the first byte indicating the string length, and
the rest are the Ascii (ie 8-bit) characters. The string length is always 0, 20 or 40 characters
(never any more) and strings are end-padded with space characters where necessary.
What is the best way to quickly read a string and get rid of the space padding at the end? To make
sure I can read them correctly, I'll put them in a text box. I assume the string used in a test box
uses 16-bit characters (unicode?) but I may be wrong here. When I'm happy I can read them correctly,
I'll get rid of the text box and store them directly in the database. Is it best to store it in the
database as unicode? I'm tempted to use Ascii for efficiency.
I was thinking of using a binary reader (_br) to extract from the file. That should be fine for
everything, but I don't know how to cope the the Ascii strings.
- Prev by Date: Re: Checking fro Table existince in SQL2005
- Next by Date: Re: Jpg sequence to MPG
- Previous by thread: Re: Reading an Ascii string
- Next by thread: Re: Reading an Ascii string
- Index(es):
Relevant Pages
|