Problems with UTF8. Unicode, ADO and MySQL



I am running a multi-lingual asp site off a mysql database. All my data
is stored in utf-8 format in order to accomodate for a wide range of
characters. In fact the whole Db is utf8. This is all working fine on
the W2000 server running IIS 5. Text can be saved to the db in any
language and is displayed correctly when retieving and sending to a
browser (with charset utf8). However, when I copy my asp files to the
W2003 server, making no changes, and then try to display the text from
the same database (still on the W2000 server), it simply returns a load
of "???" for non latin characters.

I have tried playing around with the Server.Codepage settings (which
seems to have little effect) and I have saved all my asp files in utf-8
format. None of this seems to help.

Also, I have gone through the trouble of retrieving my data with
phpmyadmin to ensure it really is stored as utf-8 and this seems to
work fine.

Some investigation has brought up the following strange behaviour (well
I think so anyway).

On IIS5:

With Session.Codepage = 1252, submitting "A pound sign: £ should
appear here" in a form field results in:

"A pound sign: £ should appear here"

in the mysql database.

Given that I don't think my sql client (SqlYog) can display unicode,
this would appear to be correct, according to the following article,
which states that £ in unicode would be displayed as A£ in a Latin
charset.

http://czyborra.com/utf/#UTF-8

If I now change the codepage to 65001 (UTF8), then only following is
stored:

"A pound sign:"

ie: all characters after the £ (and including it) are cut off.

Repeating this whole exercise on IIS6 gives this behaviour:

For session.codepage = 1252, the following is stored in the DB:

"A pound sign: £ should appear here"

This I believe is (and using the site mentioned above) is an extra
conversion from a utf-8 string interpreted as latin, into utf-8, ie the
mysql db thinks the string to be stored is: "A pound sign: £ should
appear here"

And for session codepage = 65001 I get:

"A pound sign: £ should appear here"

This *appears* to work, but if I try putting in a cyrillic character
(eg Д - not sure if this will display), with codepage=65001 then the
db stored:

"Ãxâ€x" for the cyrillic character (where x = a byte the font
of my sql client can't display), which is surely too many bytes for
utf-8.

The locale on the 2003 server is set to UK, which might be why IIS6
understand the £ character. But my impression is that the server runs
in unicode format and should therefore understand all characters. My
gut feeling is that (with codepage=65001) i'm having trouble making ADO
(in IIS5 and IIS6) understand that the characters are already in utf8
format when they are received from the browser and that they don't need
converting again. Does this make sense?

Any help in this matter is much appreciated.

Thanks,
Robin

.



Relevant Pages

  • Re: Special Characters in Query String
    ... I've had numerous problems with utf-8, ... in common characters in spanish not geting displayed. ... > available for encoding of characters. ... > If you can display your characters with ISO-8859-1, ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Aligning comments
    ... Format the date/time to ensure the same number of characters: ... You can get the display width from Len* Scale ... Dim w As Worksheet ...
    (microsoft.public.excel.programming)
  • Re: Open html as source
    ... Word X -- Word X can't display most Unicode. ... characters as two symbols, so I suppose it forgot that it's UTF8. ... Choose UTF-8 at that point. ...
    (microsoft.public.mac.office.word)
  • Re: Print Spanish characters in Perl?
    ... be able to display the accented characters. ... I know that I need to specify ... saving as UTF-8, then including the text in an HTML page but forgetting ... ISO-Latin-1 and the non-ASCII characters will be messed up, ...
    (comp.lang.perl.misc)
  • Re: How to upload form data containing special characters correctly?
    ... See the MySQL doc and comp.databases.mysql newsgroup for more info on mysql topics. ... Well, I've been hearing for a while UTF-8 is the best for all that stuff, so tables and DB's are all in utf8_general_ci ... addslashesis a PHP construct which escapes certain characters. ... I'll rather use OpenOffice with MyODBC to edit the data when needed and use a report to display it. ...
    (comp.lang.php)