Converting text between various encodings



Hi,
I'm playing with converting text strings between various encodings like
Unicode and UTF8 and UTF7. One tool I've found that seems to do at least
some of this is the oleprn.olecvt object (on W2K and later I think) which
has ToUnicode and ToUtf8 methods. TLViewer shows that the ToUnicode method
takes two arguments, a string to be converted and a long integer
representing the codepage of the string to be converted (I think), and
returns a Unicode string. It shows that the ToUtf8 method takes one
argument, apparently a little endian Unicode string.

I'm hoping that someone will post a url for some documentation on this
oleprn.olecvt object. TLViewer indicates a lot of functionality, with
coclasses having a lot of printer functionality. But I'm mostly interested
in whether the ToUnicode method can convert only from UTF8 or from any
encoding for which my computer has a codepage number. Maybe this function:
Function fsUTF8ToUnicode(sUTF8String, lCodePage)
should really be called an any-encoding to Unicode routine.

Anyhow, here is a simple script that converts a Unicode character to UTF8
and back to Unicode:

Option Explicit
Dim sUnicode: sUnicode = ChrW(&H2018)
Dim sUTF8, sNewUnicode
sUTF8 = fsUnicodeToUTF8(sUnicode)

MsgBox "Original Unicode = " & sUnicode & _
" (&H" & Hex(ascW(sUnicode)) & ")" & _
vbcrlf & "UTF8: " & sUTF8

sNewUnicode = fsUTF8ToUnicode(sUTF8, 65001)

MsgBox "Original Unicode = " & sUnicode & _
" (&H" & Hex(ascW(sUnicode)) & ")" & _
vbcrlf & "UTF8: " & sUTF8 & vbCrLf & _
"New Unicode = " & sNewUnicode & _
" (&H" & Hex(ascW(sNewUnicode)) & ")"

Function fsUnicodeToUTF8(sUnicodeString)
fsUnicodeToUTF8 = CreateObject("OlePrn.OleCvt")._
ToUtf8(sUnicodeString)
End Function 'fsUnicodeToUTF8(sUnicodeString)

'list of codepage numbers for various encodings.
'http://www.motobit.com/help/scptutl/cl68.htm
Function fsUTF8ToUnicode(sUTF8String, lCodePage)
fsUTF8ToUnicode = CreateObject("OlePrn.OleCvt")._
ToUnicode(sUTF8String, lCodePage)
End Function 'fsUTF8ToUnicode(sUTF8String, lCodePage)

-Paul Randall


.



Relevant Pages

  • Re: Multi language application
    ... the block below implies that you created your app as "Unicode app". ... Whether you choose UTF8 or UTF16 is really up to you but by default in a "Unicode app", you're likely to write less code with UTF16 strings. ... You can use::MultiByteToWideChar Win32 API to convert from UTF-8 to UTF-16, and pass the UTF-16 string to Windows controls. ...
    (microsoft.public.vc.mfc)
  • Re: Tranfering unicod charcters in Socket programming!
    ... You are telling about conversion b/w MBCS to Unicode. ... If this is not possible Shall I try with string to wstring ... int SendStringAsUnicode ...
    (microsoft.public.win32.programmer.networks)
  • Re: using structs like BROWSEINFO and OPENFILENAME (string members
    ... your discussion of unicode ... vs ansi reminded me to recheck my typelib and found a couple of errors. ... > is declared as string, the other is declared as long. ...
    (microsoft.public.vb.winapi)
  • Re: Tranfering unicod charcters in Socket programming!
    ... As you said I have to use std::wstring for unicode characters .But ... std::string object, which is a wrapper over ANSI string. ... int CParser::RetrieveCmd(string strRecvbuf, string* strCmd, ... bytesRecv - is the number of bytes. ...
    (microsoft.public.win32.programmer.networks)
  • Re: Tranfering unicod charcters in Socket programming!
    ... unicode string and back again. ... bytesRecv = SOCKET_ERROR; ... Rlp has doen to fix unicode ...
    (microsoft.public.win32.programmer.networks)

Loading