Re: CommandLineToArgvA?

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Vincent Fatica wrote:
On Sun, 7 Jun 2009 11:44:47 -0400, "Igor Tandetnik"
<itandetnik@xxxxxxxx> wrote:

No. But if the current system codepage is in fact CP1253 (Windows
codepage for Greek), and the caller did want to pass some Greek
characters to you, you will silently convert them to accented latin
characters that just happen to have the same codes in Latin-1 aka
ISO-8859-1 codepage (which is what Unicode codepoints U+0000 through
U+00FF correspond to, for historical reasons).

For example, GREEK CAPITAL LETTER ALPHA is code 193 (hex 0xC1) in
CP1253. But you will interpret it as U+00C1, LATIN CAPITAL LETTER A
WITH ACUTE.

What's the problem? When I convert each Unicode argv back to MBCS
with

while ( *p++ == (CHAR) *wp++ );

won't it go back to 193 (and again be interpreted as GREEK CAPITAL
LETTER ALPHA)? I don't think CommandLineToArgvW cares whether it's
GREEK CAPITAL LETTER ALPHA or LATIN CAPITAL LETTER A WITH ACUTE. I'm
assuming CommandLineToArgvW only **interprets** whitespace,
backslashes, and double-quotes.

Ah, I didn't realize you were going to Unicode and back. Anyway, you'd
still have problems with true double-byte encodings, like Chinese BIG-5
or Japanese Shift-JIS. In these encodings, some characters are
represented by two bytes, called lead byte and trailing byte. Lead byte
always has high bit set, but trailing byte could have any value at all,
including values that just happen to be the same as ASCII codes for
space, backslash or double quote.

Your naive algorithm will convert such double-byte character to two
independent Unicode codepoints. The codepoint corresponding to the
trailing byte could then be interpreted by CommandLineToArgvW as a
separator. As a result, a) some parameter will be broken up in the
middle, and b) when your algorithm converts back from Unicode to MBCS,
you'll end up with a lead byte not followed by a trailing byte (or
followed by an unrelated ASCII character that will be misinterpreted as
a trailing byte).
--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925


.



Relevant Pages

  • Re: CommandLineToArgvA?
    ... |codepage for Greek), and the caller did want to pass some Greek ... |characters to you, you will silently convert them to accented latin ... won't it go back to 193 (and again be interpreted as GREEK CAPITAL LETTER ... LETTER ALPHA or LATIN CAPITAL LETTER A WITH ACUTE. ...
    (microsoft.public.vc.language)
  • Re: Quick check for ISO-8859-7 Greek
    ... automatically display the Greek ok. ... I checked you page on all of my browsers. ... The Greek characters displayed correctly on Mozilla 1.7.11, ...
    (alt.html)
  • Re: new be in Fortran
    ... Then the characters are valued as usual for Greek characters ... Only Revelations was known to be a form of Greek at source. ... The header already came in 6 variants: ... This Bible Code contains prophecies also. ...
    (comp.lang.fortran)
  • Getting those Japanese characters into SQL
    ... I successfully managed to get Greek characters into ... Standards and formats - choose Greek ... >characters and a powerbuilder application. ... >have done various tests and can get Greek into raw SQL, ...
    (microsoft.public.sqlserver.server)
  • Re: The Modernization of Emacs: terminology buffer and keybinding
    ... one's conclusion, e.g., that they entered Greek or Russian or Katakana ... Katakana characters on their text terminal. ... if they get all those languages to display in all ...
    (comp.lang.java.programmer)