Re: UTF-8 encoding in AJAX web application.
- From: stcheng@xxxxxxxxxxxxxxxxxxxx (Steven Cheng[MSFT])
- Date: Wed, 21 Mar 2007 09:59:32 GMT
Thanks for your reply Allan, and also thanks for Jon's input.
Actually, the knowledge that necessary to explain the problem here is
specific to string/text charset/encoding.
Let me try to answer the questions:
#First, both UTF-8, UTF-16, UCS-2.. are one of the encoding schema of
Unicode charset. In other words, unicode character string can be encoded
into binary stream through either of these ones.
Is this done by detecting the UFT8 preamble? And the the driver converts to
UCS-2? And if so how come the result is still in UTF-8 when I retrieve the
data again?
Or is there no conversion?
======================
First, UTF-8 is the encoding that your web page and client browsesr used to
transfer the unicode characters. This is because UTF-8 is multiple-byte
encoding schema, it will has compressed size and improved performance if
the transfered data mostly contain ASCII characters(since UTF-16 or UCS-2
will always use two bytes to represent a character). And when your .net
code has successfully get the unicode string, it has already been converted
to UTF-16(unicode encoding) because .net always to two-byte Unicode
encoding to represent characters in memory. And when you use ADO.NET to
submit string/characeters data to SQL Server database.
At SQL Server side, it simpy receive the unicode characters from client,
and store them into the target table column. Here problem may occur depend
on the column's Charset type, if it is of unicode type(e.g. nvarchar,
nchar, ntext ...), SQL Server can store them correctly (persisted as UCS-2
encoding). If the column is not of uncode char type, it will use the
column/table/database's current collation (charset) to encoding the unicode
characters into binary stream(such charset is usually a multi-byte charset).
Why is it important that MSSQL only supports UCS-2 unicode if everything
works fine with UTF-8?
I can see that everything works fine when storing a UTF-8 string in an
ntext
column, and when I query the data in queryanalyzer the string is displayed
correctly in the result set, how can this be if MSSQL only supports UCS-2
encoding?
How is the string stored? in UTF-8 or UCS-2?
===========================
UTF-8 is good at data compression if the data mainly contains non-wide
chars(ASCII chars), but it is less efficient than UTF-16, UCS-2 because it
use different number of bytes to encode different characeters while UTF-16
and UCS-2 always use two bytes for each character. And SQL Server 2000 is
an old product, UCS-2 is preferred at that time. Anyway, there is no really
true or false on which encoding to choose for SQL Server here.
Therefore, for any unicode text column, they'll be persisted in UCS-2
encoding(in memory or data file). And when client application query these
data out, they can be correctly retrieved and processed as long as the
client application support Unicode. For example, .net framework can
correctly handle uncode chars and unicode chars are stored as UTF-16
encoded format in memory.
In addition, here is a good reference on MS globaldev site introducing
charset/encoding:
#Globalization Step-by-Step
http://www.microsoft.com/globaldev/getWR/steps/wrg_codepage.mspx
Not sure whether I've missed anything in your former reply, if you have any
further specific questions on this, please feel free to post here.
Sincerely,
Steven Cheng
Microsoft MSDN Online Support Lead
This posting is provided "AS IS" with no warranties, and confers no rights.
.
- References:
- UTF-8 encoding in AJAX web application.
- From: Allan Ebdrup
- Re: UTF-8 encoding in AJAX web application.
- From: Jon Skeet [C# MVP]
- Re: UTF-8 encoding in AJAX web application.
- From: Steven Cheng[MSFT]
- Re: UTF-8 encoding in AJAX web application.
- From: Allan Ebdrup
- Re: UTF-8 encoding in AJAX web application.
- From: Steven Cheng[MSFT]
- Re: UTF-8 encoding in AJAX web application.
- From: Allan Ebdrup
- Re: UTF-8 encoding in AJAX web application.
- From: Allan Ebdrup
- Re: UTF-8 encoding in AJAX web application.
- From: Allan Ebdrup
- Re: UTF-8 encoding in AJAX web application.
- From: Jon Skeet [C# MVP]
- Re: UTF-8 encoding in AJAX web application.
- From: Allan Ebdrup
- UTF-8 encoding in AJAX web application.
- Prev by Date: RichTextBox does not show FixedSingle
- Next by Date: Re: threading
- Previous by thread: Re: UTF-8 encoding in AJAX web application.
- Next by thread: Re: UTF-8 encoding in AJAX web application.
- Index(es):
Relevant Pages
|
Loading