Re: Unicode/UTF-8 decoding
- From: Göran Andersson <guffa@xxxxxxxxx>
- Date: Tue, 05 Jun 2007 17:21:06 +0200
Bill Nguyen wrote:
I set UTF-8 as the default encoding in mySQL.
I don't really know how this work, but IE or Firefox browser can decode easily.
This is the test:
I put the lines below in an HTML document and viewed it in IE, and it worked. (make sure to set encoding to UTF-8 in VIEW).
I include the test.htm for your testing. (The text is in Vietnamese).
So I think what I need is to find a utility that has the same function that might already be available out there. Any help is greatly appreciated.
Bill
----------------
<html>
<head></head>
<body>
Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh há»"n thÆ¡ Việt trên sân ga Tokyo chiá»u cuối năm
</body>
</html>
"Göran Andersson" <guffa@xxxxxxxxx> wrote in message news:%23rhR3M0pHHA.1776@xxxxxxxxxxxxxxxxxxxxxxxBill Nguyen wrote:Below are sometext I extracted from a mySQL database. How can I decode them so that I can read them in Unicode?
Thanks
Bill
------------
Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh hồn thÆ¡ Việt trên sân ga Tokyo chiá»u cuối năm
This text looks as it has been decoded with a different encoding than was used to encode it. It might be possible to recreate the data if you know what encodings was used to encode and decode it. Then you might be able to encode it back to it's prevois state and use the proper encoding to decode it. There is a great risk that some data has been lost, though, and that you can't recreate the original data from this stage.
If you want to store unicode strings in the MySQL database, it has to be set up to use unicode as character set.
--
Göran Andersson
_____
http://www.guffa.com
------------------------------------------------------------------------
> Virginia Hamilton Adair / Lâm Thị Mỹ Dạ > Lấp lánh hồn thơ Việt trên sân ga Tokyo chiều cuối năm
You are doing exactly what I was talking about. If you read the data using the wrong encoding, then save it using the same encoding, you can then open it using the corrent encoding, provided that the process hasn't removed any data.
If you have set up your MySQL database to use unicode, and still get the string out in that manner, the error is before you even saved the string in the database in the first place. What you have done is basically:
unicode -> bytes -> wrong encoding -> MySQL -> wrong encoding -> html -> bytes -> browser -> unicode
While this gives the correct result for some strings, some byte codes used in UTF-8 doesn't represent a single character by themselves, so if you contine to store mis-decoded strings as unicode, you will sooner or later experience corrupted strings.
--
Göran Andersson
_____
http://www.guffa.com
.
- Follow-Ups:
- Re: Unicode/UTF-8 decoding
- From: Bill Nguyen
- Re: Unicode/UTF-8 decoding
- References:
- Unicode/UTF-8 decoding
- From: Bill Nguyen
- Re: Unicode/UTF-8 decoding
- From: Göran Andersson
- Unicode/UTF-8 decoding
- Prev by Date: SafeArrayTypeMismatchException - .NET GURUS PLEASE HELP
- Next by Date: Small Problem
- Previous by thread: Re: Unicode/UTF-8 decoding
- Next by thread: Re: Unicode/UTF-8 decoding
- Index(es):
Relevant Pages
|