Re: TCHAR string?
- From: Daniel James <wastebasket@xxxxxxxxxxxxxxxx>
- Date: Sun, 08 Jan 2006 13:43:20 GMT
In article news:<OlPsmK$EGHA.2320@xxxxxxxxxxxxxxxxxxxx>, Jonathan Wood
wrote:
> > ANSI is just as wrong as ASCII in that context.
>
> Then what does the "A" stands for at the end of Windows API functions
> that accept non-Unicode strings?
According to Microsoft's documentation the 'A' functions are "ANSI"
rather than "Unicode" -- but that doesn't mean that "ANSI" is actually
the right term to use.
What MS generally mean by "ANSI" is either plain-old 7-bit ASCII or one
of the 8-bit encodings like ISO 8859-1 or Windows CP 1252 ... but they
don't mean to restrict the meaning only to Latin/Western encodings such
as might be used in the USA so "ISO" might be a better term than "ANSI"
(I'm pretty sure that ANSI adopts most, if not all, of the encoding
standards published by ISO, but it seems better to use the more
internationally-recognized term when talking about standards for
internaionalization).
The 'A' Windows API functions are used in "MBCS" builds, which tells us
that they're intended to be used for multibyte (as well as single-byte)
encodings that are based on narrow chars, so the 'A' suffix is a bit
misleading ... I don't know how well the 'A' funtions perfom if you
actually call them with multibyte sequences representing characters with
codes greater than 255 (e.g. UTF-8) -- I tend to regard "MBCS" as a
misspelling of "SBCS" and use Unicode if I need to encode anything
outside the scope of ISO 8859-1.
In fact, although Unicode is not itself an ISO standard; Unicode is the
official way to implement ISO/IEC 10646 one could say that Unicode is
just as much an ISO (and so an ANSI) encoding as any of the ISO encodings
like 8859-1 used in MBC builds, so the terminology MS us is really quite
misleading in a number of ways.
> I thought the whole point of Unicode was that only one TCHAR was
> required for each character?
The whoe point of Unicode is to provide a standard encoding for a huge
range of different characters and marks - see www.unicode.org for (lots)
more information.
The whole point of wide char encoding (which is called Unicode in
Windows) *was* to be able to represent any of the characters of the
Unicode character set in a single fixed-width wchar variable, as you say.
At the time that the wchar type was defined there were fewer than 64k
Unicode characters defined, and a 16-bit wchar was sufficient.
Unfortunately the number of character points defined by Unicode has
increased, and now one needs, IIRC, 21 bits to enumerate all the Unicode
character points, so a 16-bit wchar doesn't cut it any longer. When using
wide chars, one generally either ignores any characters whose codes can't
be expressed in 16 bits, or uses a multi-wchar encoding like UTF-16.
This is very all a great pity, as conventional wisdom states that
character string manipulation in programs is a hell of alot easier if all
the characters are the same size in storage. What we really need is a
32-bit wchar.
Cheers,
Daniel.
.
- Follow-Ups:
- Re: TCHAR string?
- From: Bob Eaton
- Re: TCHAR string?
- From: Jonathan Wood
- Re: TCHAR string?
- References:
- TCHAR string?
- From: kathy
- Re: TCHAR string?
- From: Ajay Kalra
- Re: TCHAR string?
- From: kathy
- Re: TCHAR string?
- From: Jonathan Wood
- Re: TCHAR string?
- From: Norman Diamond
- Re: TCHAR string?
- From: Jonathan Wood
- Re: TCHAR string?
- From: Daniel James
- Re: TCHAR string?
- From: Jonathan Wood
- TCHAR string?
- Prev by Date: Re: Venting on .NET
- Next by Date: Re: TCHAR string?
- Previous by thread: Re: TCHAR string?
- Next by thread: Re: TCHAR string?
- Index(es):
Relevant Pages
|
Loading