Re: asp.net chinese encoding
From: Jon Skeet [C# MVP] (skeet_at_pobox.com)
Date: 10/27/04
- Next message: Jon Skeet [C# MVP]: "RE: C# & VB-> General Questions"
- Previous message: Alberto: "Big trouble with connections"
- In reply to: pabv: "Re: asp.net chinese encoding"
- Messages sorted by: [ date ] [ thread ]
Date: Wed, 27 Oct 2004 08:45:16 +0100
pabv <pablo.venegas@gmail.com> wrote:
> > pabv <pablo.venegas@gmail.com> wrote:
> > > I realised (finally !) that I am connecting to a mssql backend with
> > > collation 1252.
> >
> > Hmm... I don't know details about what the collation does, but it can't
> > be restricting it to CP 1252, otherwise you wouldn't have any Chinese
> > characters at all.
>
> I use collation 1252 (SQL_Latin1_General_CP1_CI_AS). I think
> collations specifies the character set and sort-order used by mssql
> server.
>
> I may be incorrect but from my understanding the collation does
> restrict the characters stored in mssql to be codepage 1252.
>
> As a small example, for two characters stored in the database at the
> moment it stores the codepage 1252 symbols for U+00D5 (D5 = U+00D5 :
> LATIN CAPITAL LETTER O WITH TILDE) and U+00CA (CA = U+00CA : LATIN
> CAPITAL LETTER E WITH CIRCUMFLEX) which for reference can be seen at
> http://www.microsoft.com/globaldev/reference/sbcs/1252.htm
If it can only store those characters, then it can't possibly store any
Chinese characters, can it?
> Then to render the chinese characters on to the browser, by sending
> the charset gb2312 the browser does a codepage lookup of codepage 936.
> Combining the two characters above it displays the chinese symbol
> (D5CA = U+5E10 : CJK UNIFIED IDEOGRAPH) which can be seen at
> http://www.microsoft.com/globaldev/reference/dbcs/936/936_D5.htm
No - you don't combine two CP-1252 characters to get a double-byte
character - you combine two *bytes*.
> This is what I understand the process to be. Is this correct? Perhaps
> you may be able to correct me if I have mis-understood this process of
> displaying the chinese characters onto the browser.
You still need to separate the database element from the browser
element. You haven't worked out for sure (as far as I've seen) whether
the database is the problem, or the browser. When you've eliminated one
of them, it doesn't need to appear again.
> > > This will allow datagrid, textboxes to correctly display chinese
> > > symbol characters on the browser. On the html source file, the text
> > > written to the file is as codepage 1252 characters
> >
> > Written to which file?
>
> Sorry, I meant the html source file, the current web page I am
> viewing. The html file has codepage 1252 characters and the browser
> then displays the correct chinese symbol characters.
The file has *bytes* in, and the browser is being told to interpret
those bytes as GB2312. The idea of displaying one character as if it
were in a different character set is to the actual one is a really bad
one.
> I am not un-sure as to what I can do. The mssql backend is codepage
> 1252, but the characters should be displayed using codepage 936. I
> need to store both english and chinese characters for the site.
So for output, I'd suggest UTF-8, as that covers the whole of Unicode.
However, you need to work out what's actually in the database to start
with. If it's only meant to be storing CP1252 characters, but you want
it to store GB2312 characters, you need to work out exactly how you're
expecting that to happen.
> > If you can make sure that you're getting the correct data back from the
> > database as Unicode characters, you can then ignore the database
> > character set entirely, and concentrate just on the ASP.NET part,
> > without even using a database.
>
> What should the data actually be stored as? Should it be storing as
> code 936? Should the chinese symbol characters be stored in the db?
So long as you've got the database in a mode where it *can* store those
characters, it shouldn't matter *how* it stores them. Set it up so it
can store any Unicode character, and it should be fine. You just read
and write strings, and they don't have encodings associated with them.
-- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
- Next message: Jon Skeet [C# MVP]: "RE: C# & VB-> General Questions"
- Previous message: Alberto: "Big trouble with connections"
- In reply to: pabv: "Re: asp.net chinese encoding"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|