Problems with UTF8. Unicode, ADO and MySQL
- From: "Robin" <rmassart@xxxxxxxxx>
- Date: 12 Sep 2005 07:52:45 -0700
I am running a multi-lingual asp site off a mysql database. All my data
is stored in utf-8 format in order to accomodate for a wide range of
characters. In fact the whole Db is utf8. This is all working fine on
the W2000 server running IIS 5. Text can be saved to the db in any
language and is displayed correctly when retieving and sending to a
browser (with charset utf8). However, when I copy my asp files to the
W2003 server, making no changes, and then try to display the text from
the same database (still on the W2000 server), it simply returns a load
of "???" for non latin characters.
I have tried playing around with the Server.Codepage settings (which
seems to have little effect) and I have saved all my asp files in utf-8
format. None of this seems to help.
Also, I have gone through the trouble of retrieving my data with
phpmyadmin to ensure it really is stored as utf-8 and this seems to
work fine.
Some investigation has brought up the following strange behaviour (well
I think so anyway).
On IIS5:
With Session.Codepage = 1252, submitting "A pound sign: £ should
appear here" in a form field results in:
"A pound sign: £ should appear here"
in the mysql database.
Given that I don't think my sql client (SqlYog) can display unicode,
this would appear to be correct, according to the following article,
which states that £ in unicode would be displayed as A£ in a Latin
charset.
http://czyborra.com/utf/#UTF-8
If I now change the codepage to 65001 (UTF8), then only following is
stored:
"A pound sign:"
ie: all characters after the £ (and including it) are cut off.
Repeating this whole exercise on IIS6 gives this behaviour:
For session.codepage = 1252, the following is stored in the DB:
"A pound sign: £ should appear here"
This I believe is (and using the site mentioned above) is an extra
conversion from a utf-8 string interpreted as latin, into utf-8, ie the
mysql db thinks the string to be stored is: "A pound sign: £ should
appear here"
And for session codepage = 65001 I get:
"A pound sign: £ should appear here"
This *appears* to work, but if I try putting in a cyrillic character
(eg Д - not sure if this will display), with codepage=65001 then the
db stored:
"Ãxâ€x" for the cyrillic character (where x = a byte the font
of my sql client can't display), which is surely too many bytes for
utf-8.
The locale on the 2003 server is set to UK, which might be why IIS6
understand the £ character. But my impression is that the server runs
in unicode format and should therefore understand all characters. My
gut feeling is that (with codepage=65001) i'm having trouble making ADO
(in IIS5 and IIS6) understand that the characters are already in utf8
format when they are received from the browser and that they don't need
converting again. Does this make sense?
Any help in this matter is much appreciated.
Thanks,
Robin
.
- Prev by Date: Why does query take 1 minute or two hours to run?
- Next by Date: Securing database in code
- Previous by thread: Why does query take 1 minute or two hours to run?
- Next by thread: Securing database in code
- Index(es):
Relevant Pages
|
|