Re: Reading an Ascii string

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Here you can see the most common DOS code pages:

http://en.wikipedia.org/wiki/Codepage

Codepage 850 is the most likely for a brittish computer.

John wrote:
Thanks for all your replies.

Just to clarify...

"ASCII is 7-bit, and any character above 127 will need the proper encoding to be read right. I'm assuming the characters are stored as 8 bit."

Sorry for being inprecise. When said "Ascii (ie 8-bit) characters", I meant that they are stored as bytes rather than 16-bit quantities as unicode requires. The Ascii characters are all 7-bit from a quick glance but really I ought to write a quick test programme to check this, which would find and flag up non-ascii characters. Then I could try to deduce what the encoding is. The original DOS-based [stock control] programme that created the file was by a UK company and this software was used on a PC in the UK to generate the files. Is there a standard type of encoding for the UK? Is this likely to be extended ASCII, for which Göran Andersson suggested using Encoding.GetEncoding(850)? Thanks Jon Skeet for your comment about extended Ascii. I suppose that there could potentially be things like a letter e with an acute accent, and I don't want to mangle these. Once I've discovered the encoding, I'm pretty certain that it will be consistent across all the files.

Göran Andersson: "Does the length include the padding or not?"

It does include padding, so a string with three characters appears as byte 0x14 (ie 20) followed by the three characters followed by 17 space characters. I will trim the string.



"John" <-> wrote in message news:OMR74pooGHA.3304@xxxxxxxxxxxxxxxxxxxxxxx
Hi,

I'm a beginner is using C# and .net.

I have big legacy files that stores various values (ints, bytes, strings) and want to read them into
a C# programme so that I can store them in a database. The files are written by a late 1980's PC
Pascal programme, for which I don't have the source code. I've managed to reverse engineer the file
format.

The strings are stored as Ascii in the file, with the first byte indicating the string length, and
the rest are the Ascii (ie 8-bit) characters. The string length is always 0, 20 or 40 characters
(never any more) and strings are end-padded with space characters where necessary.

What is the best way to quickly read a string and get rid of the space padding at the end? To make
sure I can read them correctly, I'll put them in a text box. I assume the string used in a test box
uses 16-bit characters (unicode?) but I may be wrong here. When I'm happy I can read them correctly,
I'll get rid of the text box and store them directly in the database. Is it best to store it in the
database as unicode? I'm tempted to use Ascii for efficiency.

I was thinking of using a binary reader (_br) to extract from the file. That should be fine for
everything, but I don't know how to cope the the Ascii strings.



.



Relevant Pages

  • Re: Slightly tricky string problem
    ... I have a String, which is single character eg "a" ... ASCII is a code defining 128 entities, ... So an UTF-8 encoded file containing only ASCII characters shall ... Unicode didn't define codepoints outside the BMP. ...
    (comp.lang.java.programmer)
  • Re: Reading LAST line from text file without iterating through the file?
    ... ASCII character values are limited to the 0-127 range. ... these days we use the 8th bit for accented characters instead ... Lines are a record format. ... way to represent an ArrayList<String> losslessly in a single String ...
    (comp.lang.java.programmer)
  • Re: Slightly tricky string problem
    ... I have a String, which is single character eg "a" ... ASCII is a code defining 128 entities, ... So an UTF-8 encoded file containing only ASCII characters shall ... Unicode didn't define codepoints outside the BMP. ...
    (comp.lang.java.programmer)
  • Re: encrypt email address to a string
    ... would simply reverse the string, like Abigail said or remove the ... You don't want to do an even ASCII exchange mapping because you don't ... and make a simple escape sequence for illegal characters), ... will no be transferring integers, ...
    (comp.lang.perl.misc)
  • Re: Writing extended ascii characters to text file.
    ... so in order to get real ASCII codes you should use the GetBytes ... method of an Encoding instance configured for the ASCII encoding (as far as ... again, you've got bytes, not characters. ... > string line; ...
    (microsoft.public.dotnet.languages.csharp)