Re: More MSDN lies: RtlStringCchLength



"Jochen Kalmbach [MVP]" <nospam-Jochen.Kalmbach@xxxxxxxxx> wrote in message
news:%23It%23$PIqFHA.3720@xxxxxxxxxxxxxxxxxxxxxxx
Hi Norman!

Instead of "character" MS should have used "codepoint".

But "codepoint" doesn't mean byte because each character has a codepoint.

No

Wrong.

Codepoint means simply a defined value in a defined code-space.

Right.

So for MBCS (A) the code-space is from 0x00 to 0xFF

Wrong. The code-space depends on which code page is in use. Surely you know that the codepoint for "笹" is larger than 0xFF and the codepoint for "塚" is larger than 0xFF. By the way these characters exist in Chinese too, and in Chinese codepages these characters also have codepoints larger than 0xFF (and probably different from the codepoints that they occupy in codepage 932).


For wchar (W) the code-space is from 0x0000 to 0xFFFF

Yes (for Microsoft's wchar code-space).

And this is exactly what this function are using as "characters".

No, because some parts of MSDN use "characters" to really mean "characters".

"byte" is only true for A-versions, for W-version it is false,

You are right. MSDN should say that the A version counts bytes and the W version counts characters.

No. W also does not count characters. It also counts codepoints!!!

Wrong. Wrong. Right. The W version does count characters and the W version also counts codepoints. But the A version does not count codepoints. MSDN lies about the A version.

Unicode has currently a range from 0x0 to 0x10FFFF so it does not fit into
wchar!

True, Microsoft's Unicode is a subset of real Unicode (except for a few exceptions, I think).

Therefor you also need to do some kind of mutli-codepoint for one
character.

No. To handle all of Unicode's codepoints you need multi-Microsoft-somethings, which Microsoft doesn't handle (except for a few exceptions, I think).

MSDN really needs fixing (again).

Yes. It should update the corresponding docus to replace "characters" with "codepoint".

Well, notice that the cited MSDN pages already give two separate function headers instead of trying to unify them the way user-space MSDN pages do. So I thought that it would not be too hard to fix the pages clearly and correctly by giving two separate descriptions too. But you want to see a single word fixed in a single description. In that case, the word "characters" could be replaced by "TCHARs". But I'm still not sure if TCHARs are supposed to exist in kernel mode or not -- although ntddk.h and wdm.h export definitions of some subset of the user-mode TCHAR stuff, it seems that maybe that's a bug and maybe these headers weren't supposed to export any TCHAR definitions.

.



Relevant Pages

  • Unicode LISP??
    ... >>Unicode codepoints, in many cases, are not characters. ... They didn't yet detail how to manipulate parts of characters ... Combining codepoints in isolation are members of the ...
    (comp.lang.lisp)
  • Re: More MSDN lies: RtlStringCchLength
    ... You mix up characters and codepoints... ... because some parts of MSDN use "characters" to really mean ... Microsoft's Unicode is a subset of real Unicode (except for a few ...
    (microsoft.public.win32.programmer.kernel)
  • GetTextExtentExPoint slow for characters greater than codepoint 127
    ... contains codepoints above 127. ... below, GetTextExtentExPointW is 37% slower than when called with a 30,000 ... character string composed of only characters below codepoint 127. ... use GetTextExtentExPointW to determine which characters within strings fits ...
    (microsoft.public.win32.programmer.gdi)
  • Re: Lisps other than CLISP that support full Unicode character repertoire?
    ... > Unicode codepoints, in many cases, are not characters. ... They didn't yet detail how to manipulate parts of characters ... or UTF-8 units as elements of string representation. ...
    (comp.lang.lisp)

Quantcast