Re: Unicode to ASCII string conversion

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Jay B. Harlow [MVP - Outlook] (Jay_Harlow_MVP_at_msn.com)
Date: 09/14/04


Date: Tue, 14 Sep 2004 15:52:45 -0500

Cor,
Read my post. ;-) I only discussed reading & writing strings to ASCII, ANSI,
and UTF8 files (7 & 8 bit encodings).

You are correct System.String & System.Char are UTF-16 (16 bit Unicode),
files can be ANSI, ASCII, UTF7, UTF8, EBCDIC, UTF16 and many other
encodings.

FWIW: VS.NET 2005 (.NET 2.0, aka Whidbey, due out in 2005) appears to
support UTF-32 encoding for files.

http://msdn2.microsoft.com/library/ts575t62.aspx

Hope this helps
Jay

"Cor Ligthert" <notfirstname@planet.nl> wrote in message
news:%23DvHNymmEHA.3968@TK2MSFTNGP11.phx.gbl...
> Jay,
>
> Because of Ger's answer, now I become curious. I did not read it in your
> message, however what is the solution, Ger told he wanted a straight
> string
> to string conversion and explicitly no bytearray, however now I understand
> he can convert Unicode to a 8 bits ANSI String in VBNet? (And I am not
> talking about writing a file with 8 bits chars by decoding the char)
>
> I showed in this thread with a link to an MSDN page that a String contains
> forever 16 bits Chars.
>
> Is that documentation wrong or do I not understand it or maybe even
> something complete different..
>
> Cor
>
>
>
>
> "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP@msn.com> schreef in
> bericht
> news:%23RwyVzlmEHA.3328@TK2MSFTNGP10.phx.gbl...
>> Ger,
>> > Ah, now I think I get the idea. So when I convert a (Unicode) string
> into
>> > an
>> > ascii byte array, and then the byte array back into a string, I still
> have
>> > Unicode, right?
>> Correct, just remember that you will loose some characters going to &
>> from
>> ASCII.
>>
>> > So that is of no use when you want to write ASCII to a
>> > filestream.
>> If you need an ASCII file, then use a ASCII encoding. It really depends
>> on
>> what is going to read the file again.
>>
>> I would recommend with an ANSI encoding (see below) or UTF8 encoding.
>> With
>> ASCII you will loose all extended characters (ASCII is 7 bit encoding),
> with
>> ANSI you will loose characters that are outside of your regional ANSI
>> code
>> page. UTF8 preserves all Unicode characters. I would recommend ANSI
> encoding
>> if the file was going to be opened by casual users in Notepad. I would
>> recommend UTF8 if full Unicode support is required. ANSI & UTF8 are both
>> 8
>> bit encodings.
>>
>>
>> > Is the code below then writing ASCII output to my filestream?
>>
>> Yes that code is writing ASCII, as you included the ASCII encoding on the
>> StreamWriter constructor.
>>
>> The text file itself will contain ASCII characters, when you subsequently
>> open that text stream and read it (with a StreamReader) it will be
> converted
>> back to Unicode strings. When reading the file back try to use the same
>> encoding as written. For example if you wrote ANSI, then use ANSI to
>> read.
>> If you wrote UTF8, then use UTF8 to read. As ANSI & UTF8 encode
>> characters
>> 127 to 255 differently. Remember that Encoding.UTF8 is used on the stream
>> writer if you do not give one, if you are reading text files created by
>> Notepad, then you want Encoding.Default.
>>
>> I would recommend:
>>
>> > Dim wOutput As New StreamWriter(fsOutput, System.Text.Encoding.Default)
>>
>> Which will write the file in your current ANSI code page as defined by
>> the
>> regional settings in Windows Control Panel. Which will preserve extended
>> characters.
>>
>> Remember that ANSI is an 8 bit encoding that is dependent on region (code
>> page). While ASCII is a 7 bit encoding, ASCII does not support extended
>> characters such as ë. It will be converted into either a normal e or a ?.
>>
>> Hope this helps
>> Jay
>>
>> "Ger" <ger.rietman@rathernospam.sailsoft.nl> wrote in message
>> news:uU3WK3kmEHA.2772@tk2msftngp13.phx.gbl...
>> > Ah, now I think I get the idea. So when I convert a (unicode) string
> into
>> > an
>> > ascii byte array, and then the byte array back into a string, I still
> have
>> > Unicode, right? So that is of no use when you want to write ASCII to a
>> > filestream.
>> >
>> > Is the code below then writing ASCII output to my filestream?
>> >
>> > Dim UnicodeString As String = "abcdëfg"
>> > Dim fsOutput as New FileStream(..)
>> > Dim wOutput As New StreamWriter(fsOutput, System.Text.Encoding.ASCII)
>> > wOutput.WriteLine(UnicodeString)
>> >
>> > Thank you for your reply.
>> >
>> > /Ger
>> >
>> >
>> > "Cor Ligthert" <notfirstname@planet.nl> schreef in bericht
>> > news:eWAgM%23imEHA.3564@tk2msftngp13.phx.gbl...
>> >> Ger,
>> >>
>> >> > Thanks for your reply, but this returns a byte array. I ment
>> >> > straight
>> >> > forward string-to-string conversion. It is possible ofcourse to
>> >> > write
> a
>> >> > simple function to do this and using the encoding class, but I was
> just
>> >> > wondering why the framework does not support the "direct
>> >> string-to-string".
>> >>
>> >> In the dotNet is a "String" is forever a string of unicode Chars. What
>> >> you
>> >> call "ascii string" is forever a bytearray.
>> >>
>> >> Therefore as an answer there is nothing more than Herfried suggested.
>> >> Although you can create an array of objects which contains bytes,
> however
>> >> that is no solution in my opinion.
>> >>
>> >> I hope this helps to get the idea?
>> >>
>> >> Cor
>> >>
>> >>
>> >>
>> >
>> >
>>
>>
>
>



Relevant Pages

  • Re: Questions on possreps
    ... a method for encoding values of type T on some physical medium? ... terminated ASCII string such as "", and must be parsed ... terminated string". ... would make it an "invalid" possrep. ...
    (comp.databases.theory)
  • Re: Writing extended ascii characters to text file.
    ... so in order to get real ASCII codes you should use the GetBytes ... method of an Encoding instance configured for the ASCII encoding (as far as ... again, you've got bytes, not characters. ... > string line; ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: string to ascii on line feed
    ... first published ASCII as a standard in 1963. ... refer to multiple things, one of which might be "The encoding Java uses when we ask for the 'ASCII' encoding." ... Conceptually, we have a string in memory, and we wish to store that string to disk, using a specific encoding. ... Now when we say "Encoding FOO is n bits", what we usually mean is either "the encoding uses n bits per character to represent a given string" or the less restrictive "*on average*, the encoding uses n bits per character to represent a given string". ...
    (comp.lang.java.programmer)
  • Re: Unicode to ASCII string conversion
    ... I will go for Jay's solution and use ANSI 8-bit. ... >> ascii byte array, and then the byte array back into a string, I still ... just remember that you will loose some characters going to & from ... > If you need an ASCII file, then use a ASCII encoding. ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Bug in StreamReader.ReadLine()? It reads special chars wrong...
    ... > I compare this string to a string in an Access DB (btw, ... The same thing happens with chars ... ASCII is a 7-bit encoding and has no 'Ñ'. ...
    (microsoft.public.dotnet.languages.csharp)