Re: More MSDN lies: RtlStringCchLength



Hi Norman!

As I said... it leads to an endless discussion...

You simply mixed "codepoint" with "character".

>>> But "codepoint" doesn't mean byte because each character has a
>>> codepoint.
>>
>> No
>
> Wrong.

One "character" has *at least* one codepoint.


>> Codepoint means simply a defined value in a defined code-space.
>
> Right.

Hey, great!


>> So for MBCS (A) the code-space is from 0x00 to 0xFF
>
> Wrong. The code-space depends on which code page is in use. Surely you
> know that the codepoint for "笹" is larger than 0xFF and the codepoint
> for "塚" is larger than 0xFF.

As I said: You mix up characters and codepoints...

MBCS has a data-ytep of "char" and this is "byte" (8-bit) on all
windows-platforms; therefor MBCS has a code-space from 0x00 to 0xFF.

> By the way these characters exist in
> Chinese too, and in Chinese codepages these characters also have
> codepoints larger than 0xFF

MBCS codepoints can only be from 0x00 to 0xFF


>> For wchar (W) the code-space is from 0x0000 to 0xFFFF
>
> Yes (for Microsoft's wchar code-space).
>
>> And this is exactly what this function are using as "characters".
>
> No, because some parts of MSDN use "characters" to really mean
> "characters".

Yes. But this are only a very small set of functions and only for
MBCS... (like _mbslen).


>>>> "byte" is only true for A-versions, for W-version it is false,
>>>
>>> You are right. MSDN should say that the A version counts bytes and
>>> the W
>>> version counts characters.
>>
>> No. W also does not count characters. It also counts codepoints!!!
>
>
> Wrong. Wrong. Right. The W version does count characters and the W
> version also counts codepoints. But the A version does not count
> codepoints. MSDN lies about the A version.

No...W versions also counts only codepoints...
(We also can start to discuss composed characters...)

See my new post in
microsoft.public.dotnet.internationalization

>> Unicode has currently a range from 0x0 to 0x10FFFF so it does not fit
>> into
>> wchar!
>
> True, Microsoft's Unicode is a subset of real Unicode (except for a few
> exceptions, I think).

MS has fully UNICODE-support; but it simply uses UTF16 to store the
characters.


>> Therefor you also need to do some kind of mutli-codepoint for one
>> character.
>
> No. To handle all of Unicode's codepoints you need
> multi-Microsoft-somethings, which Microsoft doesn't handle (except for a
> few
> exceptions, I think).

Have you ever heard of composed characters?
Have you ever read the unicode-spec?


> Well, notice that the cited MSDN pages already give two separate function
> headers instead of trying to unify them the way user-space MSDN pages do.

???

> So I thought that it would not be too hard to fix the pages clearly and
> correctly by giving two separate descriptions too.

Why two separate descriptions? Simply replace "characters" with
"codepoint"... (for most functions)

> But you want to see a
> single word fixed in a single description. In that case, the word
> "characters" could be replaced by "TCHARs".

Yes. And TCHAR is simply a codepoint... (and sometimes also a character).

> But I'm still not sure if
> TCHARs are supposed to exist in kernel mode or not

Kernel-Mode normaly only has wchars... (only DebugOutput and some other
expections)

--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
.



Relevant Pages

  • Unicode LISP??
    ... >>Unicode codepoints, in many cases, are not characters. ... They didn't yet detail how to manipulate parts of characters ... Combining codepoints in isolation are members of the ...
    (comp.lang.lisp)
  • Re: Rant on character sets
    ... And a keyboard big enough to display 65,000-odd characters. ... Unicode defines more than 2^16 codepoints in its extended state. ...
    (comp.programming)
  • Re: Lisps other than CLISP that support full Unicode character repertoire?
    ... > Unicode codepoints, in many cases, are not characters. ... They didn't yet detail how to manipulate parts of characters ... or UTF-8 units as elements of string representation. ...
    (comp.lang.lisp)
  • Re: More MSDN lies: RtlStringCchLength
    ... and in Chinese codepages these characters also have codepoints larger than 0xFF. ... But I'm still not sure if TCHARs are supposed to exist in kernel mode or not -- although ntddk.h and wdm.h export definitions of some subset of the user-mode TCHAR stuff, it seems that maybe that's a bug and maybe these headers weren't supposed to export any TCHAR definitions. ...
    (microsoft.public.win32.programmer.kernel)
  • GetTextExtentExPoint slow for characters greater than codepoint 127
    ... contains codepoints above 127. ... below, GetTextExtentExPointW is 37% slower than when called with a 30,000 ... character string composed of only characters below codepoint 127. ... use GetTextExtentExPointW to determine which characters within strings fits ...
    (microsoft.public.win32.programmer.gdi)