Re: UTF-8



It depends Lou. If your locale is appropriate for the characters you want to
show then it's easy. If not then it is difficult.

For instance, if your UTF-8 represents Chinese characters, and your locale
is currently Chinese, then it's only a few lines of code. However, if your
locale is something like West European then I'm not sure I have a reliable
answer I can give you

I think someone suggested using a Web Browser control. That's probably the
best solution I've heard in the difficult case since Internet Explorer takes
care of it's own fonts for different locales

Here's a bit of code that will convert between UTF-8 from a file and Unicode
(which is what VB uses in memory). The sample then writes the Unicode to a
different ANSI file but you can just cut that bit out:
http://groups.google.ie/group/microsoft.public.vb.general.discussion/msg/00f3c3fd8182563e?hl=en

Tony Proctor

"Lou" <lou.garvin@xxxxxxxxxxx> wrote in message
news:OB5LpTvVJHA.3688@xxxxxxxxxxxxxxxxxxxxxxx
This solution did not work for me. The characters came back scrambled as:
OaECO?¡Mo2aEO!¢GA¡PIeOAA¡¦2aEu£gAEi?t!¢G 2008Ae12OA2EO!¢G

whereas the UTF-8 teext file the charatcers are:
<p>????????????????? 2008?12?2??</p>

I'm baffled and frustrated at how difficult this is.
I just want to load a utf-8 text file into a text box.
There must be a solution?
Any help is appreiated.

-Lou

"Bob Riemersma" <nospam@xxxxxxx> wrote in message
news:OwWoxvmVJHA.4024@xxxxxxxxxxxxxxxxxxxxxxx
<mark.tunnard.jackson@xxxxxxxxxxxxxx> wrote in message
news:5619f598-d2cb-4249-951d-ba36fffdf78f@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hi

There are two steps:
1 - get the characters from the UTF-8 file into a VB6 string (Unicode)
2 - put those characters into a VB6 textbox and have them legible

This logic addresses both issues:

Option Explicit
'
'Some VB6 controls, despite being "ANSI" in nature, do indeed
'support international codepages via their Font.Charset
'property.
'
'Notes:
'
'The UTF-8 file is read using an ADODB.Stream object.
'
'The Charset of Text1 has been set at design time
'via the Font property dialog in the IDE. The face is set to
'Arial Unicode MS and the script (Charset) is set to
'Charset_GB2312.
'

Private Const LCID_PRC As Long = 2052

Private Function LocalizeANSI(ByVal Expression As String, _
ByVal LCID As Long) As String
'Converts Unicode Expression to "ANSI" using an alternate LCID,
'then converts the result back to Unicode using the current
'system LCID.
'
'This "scrambles" the data, but in such a way that a VB6 control's
'Font.Charset property can accomplish proper symbol mapping.

LocalizeANSI = StrConv(StrConv(Expression, vbFromUnicode, LCID),
vbUnicode)
End Function

Private Sub Form_Load()
Dim stmUTF8 As ADODB.Stream

Set stmUTF8 = New ADODB.Stream
With stmUTF8
.Type = adTypeText
.Charset = "utf-8"
.Open
.LoadFromFile "utf8.txt"
Text1.Text = LocalizeANSI(.ReadText(adReadAll), LCID_PRC)
.Close
End With
End Sub





.



Relevant Pages

  • Re: Reg multilanguage support by gnuplot
    ... The "locale" setting is need in order to interpret 1-byte character ... It is not needed if you are using UTF-8. ... type the characters directly into your command string. ... set label 1 at screen 0.2, ...
    (comp.graphics.apps.gnuplot)
  • [PATCH] UTF-8 input: composing non-latin1 characters, and copy-paste
    ... One can put the keyboard driver into Unicode mode, load a Unicode keymap, and get single keystrokes generate valid UTF-8 for non-ASCII characters. ...
    (Linux-Kernel)
  • Re: Unicode string libraries
    ... UTF-8 is the encoding that must be used ... I initially thought that the variable-length characters ... but also that UTF-8 didn't break when Unicode got extended ...
    (comp.programming)
  • Re: Unicode string libraries
    ... I know that Perl uses UTF-8 as its internal string representation. ... characters defined within the BMP). ... search on UTF-8 encodings is equivalent to a search on Unicode ... it makes sense to choose other criteria for your internal encoding. ...
    (comp.programming)
  • Re: Fast UTF-8 strlen function
    ... >> Is there a fast UTF-8 string length function floating around? ... Length in bytes, or length in characters? ... For UTF-8, the main basic "change" you have to make to your string routines ... then I could individually look up the characters in my UNICODE ...
    (alt.lang.asm)

Quantcast