Re: UNICODE to MBCS



I believe I have figured part of this out from various sources on the
net. What appears to be happening is when I'm reading the unicode
encode ini file (UTF-16) the windows API GetPrivateProfileStringA does
a conversion internally from Unicode encoding to ANSI encoding. The
reason I believe this is because of the following tests I have run.
Each each of the tests below I was basically reading the unicode
encoded file into a MBCS compiled dll using the win32 API
GetPrivateProfileStringA. I was then displaying the result into a
document (crystal report) using the Arial Unicode Font.

OS = English Regional Settings
INI = UNICODE (UTF-16) containing code points for English and Chinese
charactes
When the regional settings where set to English, and thus using the
1252 codepage, the english characters where coming through after a
conversion in the win32 API from Unicode >> English ANSI. But the
Chinese characters where all unresolved and appearing as '?'. This
makes sense because the 1252 code page does not have equivalent code
points for the Chinese characters.

OS = Chinese
INI = UNICODE (UTF-16) containing code points for English and Chinese
charactes
Under these conditions the same conversion was happening except that a
different code page was being used because my default locale at that
time was set to chinese. Here the Unicode >> Chinese ANSI was
displaying everything correctly because the chinese code points had
equivalents in the ANSI code page (code page 936?). And of course the
English was displaying becuase most (all?) code pages have 32-127
devoted to ASCII.

I then took the above tests a step further and noticed that I could
place other language code points into the ini file and have them appear
when using the Chinese regional settings. I was seeing Russian and
Japanese in the result which added to my confusion. I then did some
more looking around and found that there are some mega code pages that
have been developed recently for the Chinese language like GB
18030-2000 which is based on GB 2312-1980. These code pages include not
only Chinese but a who array of characters from different languages.
Now given that I can't help but wonder well how much support. Would it
be safe for me use set a systems regional settings to Chinese when I
really want to show Japanese text? Will these mega-codepages cover me?
My gut feeling is no. There are most likely certain language constructs
that are only available in that language codepage.

I hope this helps someone out there.

.



Relevant Pages

  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)
  • Re: Japanese Chinese tea web sites
    ... >>> character pairs are used for Japanese font sets. ... >>> see are from the Japanese fonts and not Chinese. ... >>> languages take two characters for representation and a corresponding ... But UTF-8 *is* Unicode. ...
    (rec.food.drink.tea)
  • Re: Unicode is driving me nuts!
    ... using unicode, as you also mentioned you used to have ... hopefully not to have a lot of unreadable characters. ... > sample Chinese document. ... > Anthony> But when I attempted to run the script ...
    (comp.lang.python)
  • Higher Unicode characters
    ... Can someone explain something regarding unicode? ... I have to work with documents written in Simplified/Traditional Chinese (by ... read the characters and it pops up the Unicode conversion dialogue box. ... Also, once the characters have been written, why is it that applying a font ...
    (microsoft.public.word.printingfonts)
  • Re: MFC(VC6) Application Localization from French to Chinese(RPC)
    ... If you are using VC6 you must be sure to have the Chinese code page loaded for the characters to work correctly. ... If you are using 2005 you can also open the RC file with notepad and resave it as Unicode and the VC resource editor will maintain it in Unicode for you. ...
    (microsoft.public.vc.mfc)