Re: Unicode/UTF-8 decoding
- From: Göran Andersson <guffa@xxxxxxxxx>
- Date: Tue, 05 Jun 2007 09:28:59 +0200
Bill Nguyen wrote:
Below are sometext I extracted from a mySQL database. How can I decode
them so that I can read them in Unicode?
Thanks
Bill
------------
Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh hồn thÆ¡ Việt trên sân ga Tokyo chiá»u cuối năm
This text looks as it has been decoded with a different encoding than
was used to encode it. It might be possible to recreate the data if you
know what encodings was used to encode and decode it. Then you might be
able to encode it back to it's prevois state and use the proper encoding
to decode it. There is a great risk that some data has been lost,
though, and that you can't recreate the original data from this stage.
If you want to store unicode strings in the MySQL database, it has to be
set up to use unicode as character set.
--
Göran Andersson
_____
http://www.guffa.com
.
Relevant Pages
- Re: Unicode/UTF-8 decoding
... I don't really know how this work, but IE or Firefox browser can decode easily. ... This text looks as it has been decoded with a different encoding than was used to encode it. ... If you want to store unicode strings in the MySQL database, it has to be set up to use unicode as character set. ... While this gives the correct result for some strings, some byte codes used in UTF-8 doesn't represent a single character by themselves, so if you contine to store mis-decoded strings as unicode, you will sooner or later experience corrupted strings. ... (microsoft.public.dotnet.languages.vb) - Re: Code review: UTF-8
... > I'm currently working on stuff involving Unicode encodings. ... However, the unique shortest encoding ... and itself has no meaning as a character in ... could legally encode values up to 0x7FFFFFFF, ... (comp.programming) - Re: Unicode from Web to MySQL
... >Encoding for example is a UTF-8 page Vietnamese, ... name) IS NOT UNICODE! ... decode it (to a unicode object), I guess I'll have to tell Python ... (comp.lang.python) - Re: encode UTF8 -> MIME
... That's URL-encoded UTF-8. ... You have *decode* it to get é. ... you again have to decide on a specific character encoding ... So you have to encode it in your character encoding first, ... (comp.lang.perl.misc) - Re: how to remove c++ comments from a cpp file?
... inputs as soon as possible, work on Unicode, encode only when you write the output. ... and no unicode character contains '/' in its representation using that encoding apart from '/' itself ... Looking for the byte sequence '//' into data encoded with a different encoding could give false positives. ... (comp.lang.python) |
|