Re: Will Multibyte app work for Chinese users?

From: Joseph M. Newcomer (newcomer_at_flounder.com)
Date: 02/14/05


Date: Mon, 14 Feb 2005 02:14:41 -0500


(a) let this be a lesson: always code Unicode-aware!

(b) Search for all instances of quote marks. Replace all instances of quoted strings (such
as "string") with an _T() macro (e.g., _T("string")). Ditto for characters, 'a' would
become _T('a').

Make sure there are no instances of native-languages strings anywhere in your source code
(although if you are running on a Chinese site, you probably already have done this).
Strings such as _T("%d"), _T("\n"),_T("%d, %s = %d") and the like are language-independent
and therefore do not have to be in the STRINGTABLE.

I've not done Chinese (our products are distributed in Europe, so right now I only have
used left-right languages) so I'm not sure about the implications of a right-to-left
sequencing; these may be subtler. I can't help you at all in that area.

Replace all instances of the declaration
        char t;
with
        TCHAR t;

Replace all instances of LPSTR or LPCSTR with LPTSTR or LPCTSTR.

Replace all instances of char * with LPTSTR.

Replace all instances of 'str' functions with the appropriate 'tcs' (or in some cases
'_tcs', alas, the rules about the presence of the _ are inconsistent, check tchar.h)
functions.

Make sure that all instances of sizeof() that apply to character-string sizes are replaced
by a suitable computation, e.g.,

char p[1000];
SomeAPI(..., p, sizeof(p));

need a macro such as

#define DIM(x) ( (sizeof(x)) / (sizeof((x)[0]) )

TCHAR p[1000];
SomeAPI(..., p, DIM(p));

in other cases, make sure you have a multiplier or divisor of sizeof(TCHAR) in the right
places, e.g.,

CString p = ...;

WriteFile(h, (LPCTSTR)p, _tcslen(p) * sizeof(TCHAR), &bytesWritten, NULL);
or
WriteFile(h, (LPCTSTR)p, p.GetLength() * sizeof(TCHAR), &bytesWritten, NULL);

would be critical. Watch out for those multipliers (or divisors, as appropriate); they
are the source of most of the remaining bugs.

Actually, I've found that I can do this transformation to about 60K lines of code in about
three days, and get it right just about immediately. It isn't actually that hard. The
problem is that the one or two places you miss will come back and kill you, so be sure you
test thoroughly. Code that I create rarely has any problems; in a few cases where I
"cheat", I make sure that under _UNICODE/UNICODE (you need to set BOTH predefined
symbols!) that it will get a compilation error. It generally takes me no more than ten
minutes to make the fix needed. This happens perhaps once in every three or four programs.

Basically, you just have to sit down and spend a few days of tedious editing, and it will
be mostly right. Just make sure you test thoroughly to get those last few cases. Then it
will be completely right.

Introducing mbcs is going to vastly more difficult, because a lot of what you implicitly
assume is now wrong. For example, this no longer iterates over a string:
        LPCTSTR p = ...;
        while(*p != _T('0'))
                          {
                           ...
                           p++; // fatal error in MBCS!
                          }
and you have to use the appropriate advance operation; I think it is _tcsnext or something
like that, but I've never used MBCS; I just know that it is laden with traps like this. So
unless you know Unicode won't work, I'd suggest using only Unicode.

The problems often arise when you are dealing with network traffic, which might be MBCS or
8-bit only, or you might be reading MBCS from some other source stream. In that case, use
the MultiByteToWideChar API to do the right conversion on input or output and keep it
consistently Unicode inside the app.
                                        joe

On Sun, 13 Feb 2005 14:06:07 +0000, Mike <Mike@home.net> wrote:

>Hi,
>I have an application which I built in vc 6.
>A Chinese user running Simplified Chinese Win2000 pro has reported
>that it fails on his sytem. I presume this is because I didn't write
>and compile with Unicode support.
>Now I've upgraded to VS.Net 2003, and am considering building a
>Unicode version of my app. However, I have noticed the multibyte
>option. Will multibyte compilation solve my problem with Chinese
>users? Will I still have to add _T for literals, change char* to
>LPTSTR etc?
>My application is for Win 2000 and above.
>
>Thanks.
>Mike

Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm



Relevant Pages

  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)
  • Re: Changing part program to display UNICODE strings
    ... the English and Chinese strings, ... display it with the Arial Unicode MS font. ... Most Win32 APIs dealing with strings come in an A and a W ("wide ... TextOutW or TextOutA depending on whether UNICODE is #defined. ...
    (microsoft.public.vc.language)
  • Re: Code Page problem in SetWindowText
    ... So I guess if you are moving strings in and out of controls a lot there could even be a performance improvement using Unicode. ... I wish Windows/MFC/all those good things had better handling for other methods like UTF-8 that would give similar results as MBCS. ... But much of the inherent speed advantage of MBCS is negated by the native API in Win2K/XP/Vista being Unicode, so having a Unicode app allows us to call these API's directly and not go through thunks. ...
    (microsoft.public.vc.mfc)
  • Re: Dangerous behavior of CString
    ... If I'm reading a data file or serial port or something, if the raw data are multibyte but the compilation is Unicode or vice-versa, then sometimes the converting constructors in CString are convenient. ... I did not actually write code like this; in fact I was pretty careful always to use the _T macro with any literal strings. ... But it does the conversion using the current 8-bit code page, which is not what I want. ...
    (microsoft.public.vc.mfc)
  • Re: Help please
    ... i would like to provide "CSimString" class code because the settings ... I agree with Tom that first step is project clean and rebuild all. ... with a Unicode string, ... Consider that VS2005 strings are Unicode by default, ...
    (microsoft.public.vc.mfc)

Loading