Re: Unicode/UTF-8 decoding

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Göran ;

I think you are correct. However, not much I can do since I can not change
the host server parameters.
I am using SQLyog to access mySQL remotely. What I need is to be able to
read the data in its correct format/encoding scheme. Is it possible with
..NET ?

Thanks

Bill

"Göran Andersson" <guffa@xxxxxxxxx> wrote in message
news:eWl7rU4pHHA.4100@xxxxxxxxxxxxxxxxxxxxxxx
Bill Nguyen wrote:
I set UTF-8 as the default encoding in mySQL.
I don't really know how this work, but IE or Firefox browser can decode
easily.
This is the test:
I put the lines below in an HTML document and viewed it in IE, and it
worked. (make sure to set encoding to UTF-8 in VIEW).
I include the test.htm for your testing. (The text is in Vietnamese).
So I think what I need is to find a utility that has the same function
that might already be available out there. Any help is greatly
appreciated.

Bill

----------------
<html>

<head></head>

<body>



Virginia Hamilton Adair / Lâm Thá»< Mỹ Dạ
Lấp lánh há»"n thÆ¡ Viá»?t trên sân ga Tokyo chiá»?u cuá»'i nÄfm


</body>

</html>



"Göran Andersson" <guffa@xxxxxxxxx> wrote in message
news:%23rhR3M0pHHA.1776@xxxxxxxxxxxxxxxxxxxxxxx
Bill Nguyen wrote:
Below are sometext I extracted from a mySQL database. How can I decode
them so that I can read them in Unicode?
Thanks

Bill

------------

Virginia Hamilton Adair / Lâm Thá»< Mỹ Dạ
Lấp lánh há»"n thÆ¡ Viá»?t trên sân ga Tokyo chiá»?u cuá»'i nÄfm

This text looks as it has been decoded with a different encoding than
was used to encode it. It might be possible to recreate the data if you
know what encodings was used to encode and decode it. Then you might be
able to encode it back to it's prevois state and use the proper encoding
to decode it. There is a great risk that some data has been lost,
though, and that you can't recreate the original data from this stage.

If you want to store unicode strings in the MySQL database, it has to be
set up to use unicode as character set.

--
Göran Andersson
_____
http://www.guffa.com

------------------------------------------------------------------------

> Virginia Hamilton Adair / Lâm Th? M? D? > L?p lánh h?n tho Vi?t trên
sân ga Tokyo chi?u cu?i nam

You are doing exactly what I was talking about. If you read the data using
the wrong encoding, then save it using the same encoding, you can then
open it using the corrent encoding, provided that the process hasn't
removed any data.

If you have set up your MySQL database to use unicode, and still get the
string out in that manner, the error is before you even saved the string
in the database in the first place. What you have done is basically:

unicode -> bytes -> wrong encoding -> MySQL -> wrong encoding -> html ->
bytes -> browser -> unicode

While this gives the correct result for some strings, some byte codes used
in UTF-8 doesn't represent a single character by themselves, so if you
contine to store mis-decoded strings as unicode, you will sooner or later
experience corrupted strings.

--
Göran Andersson
_____
http://www.guffa.com


.



Relevant Pages

  • Re: diferences between 22 and python 23
    ... >if strings had an encoding attached. ... >I would use a Unicode object to represent these characters. ... ISTM str instances seem to be playing a dual role as ascii-encoded strings ...
    (comp.lang.python)
  • Re: regular expressions and the LOCALE flag
    ... Strings with the 'u' prefix are Unicode strings, ... to be explicit, if the local encoding is 'utf8', none of the following will get a hit: ... Characters are categorised according to the ...
    (comp.lang.python)
  • Re: diferences between 22 and python 23
    ... >> encoding attribute. ... I was being sloppy and using "unicode" as ... The point being to preserve character identity information from the original ... What would be the meaning of concatenating strings, ...
    (comp.lang.python)
  • Strange unicode / no unicode phenomen with mysql
    ... In my application I get a table as a sqlite table which is being compared to an existing mySQL Table. ... The sqlite drive returns all strings from the table as a unicode string which is Ok. ...
    (comp.lang.python)
  • Re: Best ways of managing text encodings in source/regexes?
    ... etc. when compiling regexes with python's re.UNICODE flag. ... the encoding of one's source strings when building regexes, ... READ TO/WRITE FROM UNICODE STRING OBJECTS. ...
    (comp.lang.python)