Questions about MSDN for some DDK functions



1.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/hh/kmarch/k109_81806669-d742-4cb9-b4dd-d7e40fef171a.xml.asp
says:
Return Value
RtlUpcaseUnicodeChar returns the uppercase version of the specified
Unicode character.

MSDN gives no exceptions. Even if the uppercase version of the specified Unicode character requires two Unicode characters to express, RtlUpcaseUnicodeChar returns it in one WCHAR. May I express some doubts about this definition?

2.
http://msdn.microsoft.com/library/en-us/kmarch/hh/kmarch/k109_c1a13e9a-f863-4bcd-ae89-daee0c3d3a4b.xml.asp
says:
Return Value
RtlUpperChar returns the uppercase version of the specified character or
returns the value specified by the caller for Character if the specified
character cannot be converted.

That makes sense. In a case where the relevant ANSI character fits in a single byte and was specified, but the uppercase version requires two ANSI characters and therefore cannot be converted, we can figure out what this function will do.

The page continues:
RtlUpperChar returns the input Character unconverted if it is the lead
byte of a multibyte character or if the uppercase equivalent of Character
is a double-byte character. To convert such characters, use
RtlUpcaseUnicodeChar.

That makes sense too. If the relevant ANSI character doesn't fit in a single byte then we have to convert the character from ANSI to Unicode, then call RtlUpcaseUnicodeChar, then convert the result from Unicode to ANSI.

But something is missing.  If the relevant ANSI character does fit in a
single byte but cannot be directly converted, then in this case also should
we convert the character from ANSI to Unicode and call RtlUpcaseUnicodeChar
and convert the result back?  I think not because of problem 1 above.  So
there is still no way?

3.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/hh/kmarch/k109_c1a13e9a-f863-4bcd-ae89-daee0c3d3a4b.xml.asp
says (for RtlUpperString):
The number of bytes copied from SourceString is either the Length of
SourceString or the MaximumLength of DestinationString, whichever is
smaller.

So even if the MaximumLength of DestinationString is longer than the Length of SourceString and is also long enough to hold the entire uppercase conversion of SourceString, RtlUpperString will still truncate the result at the Length of SourceString, will waste the remaining available space, and will lose some of the characters that should have been converted. Why?

4.
A little bird told me why RtlIsValidOemCharacter isn't documented.  Probably
the thing takes a single byte parameter and returns a single byte result and
is thoroughly incapable of distinguishing valid OEM characters from garbage.
But why is the thing exported?  Are there really some callers that call it?
If so, aren't the callers guaranteed to fail?  Wouldn't it be better to
delete the function RtlIsValidOemCharacter entirely?

(Where I live, Microsoft products default to code page 932 for both ANSI and
OEM.  The contents of boot.ini are read and displayed in Shift-JIS not
Unicode.  I've also seen a Microsoft product with a different default code
page where one particular single byte lowercase letter uppercases to SS, but
didn't experiment with kernel mode programming in it.)

.



Relevant Pages

  • Re: Questions about MSDN for some DDK functions
    ... Unicode character requires two Unicode characters to express, RtlUpcaseUnicodeChar returns it in one WCHAR. ... In the ANSI encoding which Microsoft commonly uses for that character set, one single byte lowercase letter properly uppercases to two single byte uppercase letters. ...
    (microsoft.public.development.device.drivers)
  • Re: VB - Ascii to Unicode and then Unicode to UTF-8 conversion (Very desperate!!)
    ... Latin together) then you have to use a Unicode column type. ... AscW returns the real Unicode character ... for Chinese characters, ... then the next thing to worry about is your CSV file. ...
    (microsoft.public.vb.general.discussion)
  • Re: Unicode Support
    ... if two Unicode strings are the same? ... UTF-16 is basically telling everyone "ok we all got to start ... character, and will likely support *both* endians. ... UTF-8 encodings are also easy to learn to ...
    (alt.lang.asm)
  • Re: Determining if a string is Unicode
    ... there's nothing magic about Unicode. ... where each character occupies 2 bytes, as opposed to a Single-Byte Character ... You could load up a string with rubbish, ... > INF file like so: ...
    (microsoft.public.vb.general.discussion)
  • Re: KANJD212
    ... >>Who decides the factors and what are their criteria, Unicode? ... But once a character is defined/get a codepoint in Unicode it ... standard modifies the codepoint of the kanji to a totally new ... I can use a code like JIS X0208 along with a font ...
    (sci.lang.japan)