Re: Help with UNICODE.
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Mon, 24 Jul 2006 16:57:27 -0400
If it was done without any consideration for Unicode, you have to do several things:
Find every occurrence of 'char' in your code. Unless demonstrably required to represent
8-bit characters, you must replace them with 'TCHAR'.
char -> TCHAR
char * -> LPTSTR
const char * -> LPCTSTR
then you must find all literal strings and put _T() around them, e.g.,
"cat" -> _T("cat")
'A' -> _T('A')
you must find and fix all occurences of any str-function in your code, e.g.,
strcpy -> tcscpy
strcmp -> tcscmp
etc.
However, this is complicated arbitrarily because the file tchar.h sometimes prefixes the
tcs version with an _, e.g., _tcsXXX for strXXX. There's no rhyme or reason to why this
is done, so you have to look at tchar.h to figure out which one to use.
Make sure you have not accidentally confused BYTE and char; BYTEs are always 8-bit
quantities that represent unsigned 8-bit values (e.g., pixel values in a bitmap) and char
is always a signed quantity that usually represents a text character
Make sure that any time you have used sizeof() with respect to a character array that you
now change it to multiply or divide by sizeof(TCHAR) when needed.
char p[] = "This is a test";
WriteFile(h, p, strlen(p), &bytesWritten, NULL);
has to become
TCHAR p[] = _T("This is a test");
WriteFile(h, p, tcslen(p)*sizeof(TCHAR), &bytesWritten, NULL);
because tcslen is now the length in CHARACTERS, and WriteFile wants a length in BYTES. So
to get the correct length, you have to multiply by sizeof(TCHAR). [That might be _tcslen,
by the way, I always have to look them up...]
But since the size of p is actually known at compile time, had you written
WriteFile(h, p, sizeof(p), &bytesWritten, NULL);
it has to remain
TCHAR p[] = _T("This is a test");
WriteFile(h, p, sizeof(p), &bytesWritten);
because you want to write all the bytes. Similarly, had you written
char buffer[MAX_PATH];
GetModuleFileName(NULL, buffer, sizeof(buffer));
you now have to write
TCHAR buffer[MAX_PATH];
GetModuleFileName(NULL, buffer, sizeof(buffer)/sizeof(TCHAR));
because sizeof(buffer) would return 2 * MAX_PATH (sizeof is ALWAYS in bytes) but you only
have MAX_PATH characters available, so you have to compensate for it.
Life gets dicey if you have to deal with 8-bit external data, such as network messages,
files, etc. Here you really do have to use char, and convert the 8-bit data to Unicode
data. Exactly what the best strategy for this is depends on your application domain.
For pure Unicode conversion, you can use the macros A2W to convert 8-bit to 16-bit data.
You can also use CStringA to hold 8-bit data and pass it around to other CStringA values.
But if you use a CStringW (or a CString, in Unicode mode) there will be an automatic
conversion if you use a constructor, and a compiler error if you try to use assignment.
If you need more control over what is going on, or need some encoding like UTF-8, you can
use MultiByteToWideChar yourself, and to convert back you can use WideCharToMultiByte.
(A2W and W2A will do the job; if you need compatibility with an ANSI version, use A2T and
T2A, for example).
A Unicode file will often (but is not mandated) to start with FFFE (little-endian, such as
x86 machines), or FEFF (big endian, such as 68K, PowerPC, Sparc), to indicate the
"endianness" of the Unicode characters. This is referred to as the Byte Order Mark (BOM).
If you open a file and find this, you have a high confidence it is Unicode, and you
actually know the encoding. You would throw this away as a non-interesting character. If
you don't find it, you have a reasonable confidence it is a file of 8-bit bytes. In
general, you might want to extend the file-open dialog with a dropdown with options such
as "Open as Unicode", "Open as 8-bit characters", or "Auto-detect" (which should be the
default) to give the user specific control when the BOM is either absent or misleading.
THat may be about 95% of what you need to do to make things Unicode-compliant. I've
usually managed to do this edits in a few days, and it almost always works correctly in
all cases. But I've been nuked occasionally by missing a key 8-to-16, 16-to-8, or
never-translate-from-8-bit situation.
This is why I always now code "Unicode-aware", and in those places where I have to worry
about it but it isn't critical, I make sure to leave comments and have a piece of code
that won't compile if Unicode is enabled (some clients don't want to pay for full Unicode
compliance in the deliverable)
joe
joe
On Wed, 19 Jul 2006 20:01:02 -0700, William GS <WilliamGS@xxxxxxxxxxxxxxxxxxxxxxxxx>
wrote:
Hello everybody. I have a project created with VS Wizard (VC6), I have toJoseph M. Newcomer [MVP]
compile it with UNICODE compliance in VS2005; what settings have I to change?
is there another change to do?
Thanks in advance,
William GS
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- Prev by Date: Re: Error msg said that "Debug Error - Damage: after normal block (#112) at ...
- Next by Date: Re: Help with UNICODE.
- Previous by thread: Re: Help with UNICODE.
- Next by thread: Make an ActiveX Object Printable
- Index(es):
Relevant Pages
|