Re: Convert Encoding from Shift-JIS to UTF-8

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Jon Skeet [C# MVP] (skeet_at_pobox.com)
Date: 10/28/04


Date: Thu, 28 Oct 2004 19:26:22 +0100

DbNetLink <robin@____dbnetlink.co.uk> wrote:
> >> If you want the UTF-8 encoded bytes, just use Encoding.UTF8.GetBytes(S)
>
> Is that not what I am doing in the line:
>
> Response.Write( SourceEncoding.GetString( TargetEncoding.GetBytes(
> S ) ) );

No. You're converting the string into UTF-8, but then using the result
as if it were a valid shift-jis-encoded byte array.
 
> Given the earlier line:
>
> Encoding TargetEncoding = Encoding.UTF8;
>
> I did read the link but was unable to relate it directly to my problem of
> converting one encoding to another using .Net.

It gives the fundamentals, which should explain why the line of code at
the top is a really bad idea.

> If it is simply down to an error in my code perhaps you could point it out
> as I have already spent 2 days on trying to understand what I am doing wrong
> and would love to be put out of my misery :(

You should just be able to use the string, without venturing into
encodings at all.

If that's not working, you need to work through it step by step - see
http://www.pobox.com/~skeet/csharp/debuggingunicode.html

-- 
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too


Relevant Pages

  • Re: UTF-8 encoding
    ... I need to pass a UTF-8 encoded writer ... reading that file with the system's default encoding. ... String), but used elsewhere as if it were a StringBuffer. ... There's a very good reason that ...
    (comp.lang.java.programmer)
  • Re: DBD::ODBC and character sets
    ... you have and accept UTF-8 encoded data does mean you need to "use ... encoding" but if your script is encoded in xxx you need "use encoding ... Perl sees the left-hand side of eq as a string literal containg sixcharacters encoded as ISO-8859-1 ...
    (perl.dbi.users)
  • question about character encodings with Tcl interpreter embedded in C++
    ... I'm struggling with an encoding problem. ... I have a utf-8 string that I would like to convert to iso8859-1. ... puts; ...
    (comp.lang.tcl)
  • Re: PEP 263 status check
    ... > chosing windows-1252 as the source encoding. ... in the string module, the string methods and all through ... encoded data (including utf-8 encodings) ... character that is outside of the 7-bit ascii subset. ...
    (comp.lang.python)
  • Re: SimpleXmlRpcServer and character encoding
    ... The client is written in java using Apache XmlRpc library 2.0. ... Is there any solution other than sending all string values in Base64 ... And unicode IS NOT utf-8. ...
    (comp.lang.python)