Re: How many bytes per Italian character?
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Tue, 10 Apr 2007 12:17:25 -0400
AH! So now you're actually describing the problem! You didn't say that %S was being used
to format an ANSI string, and yes, there's a possibility that it isn't implemented on CE,
but you could tell this very readily by reading the C Runtime source for CE, to see if
they did it right. One of the problems I've found from other people asking CE questions
is that the CE group copied all of the documents from VS, and didn't always update them to
reflect changes in CE, such as unsupported features. When in doubt, read the source.
SOmehow, you went from a discussion about the number of bytes in Italian characters to a
discussion about %S without even hinting that an ANSI string was involved in the
conversion. Because of delays in releasing the shift key after typing %, I've found
several situations in which programmers have erroneously typed %S when they meant %s, and
nobody noticed (even me, after staring at their code for half an hour; the first time, I
found it by single-stepping into the CRT to discover it). So the abrupt context switch
was confusing.
MultiByteToWideChar is certainly a good solution.
More below...
On Tue, 10 Apr 2007 14:49:35 +0900, "Norman Diamond" <ndiamond@xxxxxxxxxxxxxxxx> wrote:
"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in message****
news:erpl13htgnjdfj26jasp16meb2q82oec3i@xxxxxxxxxx
On Tue, 10 Apr 2007 09:52:13 +0900, "Norman Diamond"
<ndiamond@xxxxxxxxxxxxxxxx> wrote:
"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in message****
news:efjk13p1rhh2ospibfvfjt43e1i3u3q0cl@xxxxxxxxxx
Well, the question is misleading, since it says "per Italian character",
and characters in Italian are the same size as characters in English, or
French, or Urdu.
Maybe the question was overspecified. I was afraid that if I just asked
"How many bytes per character" then the tangents would go off into
surrogate pairs (which aren't supported in CE to the best of my knowledge
but might be someday and anyway wouldn't help this issue) instead of
tangents that go off into implementing pure virtual functions ^_^ Also a
few months ago I found the Japanese version of CE barfing all over
Japanese characters when an idiot programmer tries to use the %S format in
StringCchPrintf.
That's because %S is carefully specified to be completely useless.
Huh?????????
It means, in a Unicode app, "format an 8-bit string"
Which is almost exactly[*] true, almost exactly[*] what I needed, which is
almost exactly[*] what it failed to deliver. Even in Windows CE,
gethostname returns an ANSI string, which I needed to insert into a
displayable Unicode string.
and in an ANSI app means "format as a Unicode string". So what you're
saying is that "an incorrectly-written program fails:" for which there is
little sympathy.
Huh. Why do you think that an app which needs to take the result of
gethostname, convert it to Unicode, put it into a displayable Unicode
string, and show it to the user in a message box, is an incorrectly-written
program? Sure if I need sympathy I should look somewhere else, but I fail
to see what is bad about this program. I ended up calling
MultiByteToWideChar[*] and using the %s format specifier instead.
[* To be a bit closer to exactly true, notice that in a Unicode app, %S is
supposed to convert multibyte to Unicode, the same as MultiByteToWideChar
does. One difference is that MultiByteToWideChar actually works. But to be
one further bit closer to exactly true, notice that in MSDN page for the CE
version of StringCchPrintf, %s and %S are specified as being completely
broken in a different way from the actual breakage.]
Also I found Visual Studio 2005 barfing on an English character, the*****
pound sign. So I wanted to make it clear that this was a simpler
question.
Define "barf". What is going wrong, and what is the manifestation of the
error?
The manifestation was:
Huh? So nothing happens, and this defines "barf"?
****
***
And that's all she said. In Visual Studio 2005, I used the mouse to select
a string containing a pound sign and copy it to the clipboard. In Wordpad,
I used the mouse to do a paste from the clipboard. When it reached the
pound sign,
Pasted the pound sign into what? Where? Into the STRINGTABLE? I just did that, and it
works fine. What's WordPad got to do with VS? You still haven't described what you did.
That is, how can I reproduce the problem?
****
****
*****As far as I know, the Registry calls will not cause buffer overruns;
I've not seen any problem in regular Windows,
IRRELEVANT. We've already seen enough evidence that regular Windows and
Windows CE are not bug-for-bug compatible.
So have you detected that WinCE will cause a buffer overrun?
Answered here:
The only evidence we have at the moment is that CE thinks five wchar_t's
occupy twenty bytes. It seems to me that extreme caution is warranted.
I still think extreme caution is warranted.
Though from discussion in the newsgroups where I originally posted the
question, it's really starting to look like this particular breakage is in
Remote Registry Editor rather than in Windows CE. I'm trying to figure out
how to test that.
AH, yet even MORE new and useful information comes out! You have a very complicated set
of interactions here, which you had reduced to a question about how may bytes per Italian
character, without giving all this supplemental information. Yes, add something new to
the mix, and indeed, the whole picture can change. Then you add something about
StringCChPrintf being broken, which turns out to be that %S appears to not work, but you
didn't even say that your purpose was printing an ANSI string, and in any case, it is
almost certainly NOT StringCchPrintf but more likely the C runtime library (just because
you called StringCchPrintf doesn't mean it is the source of the problem). When things are
going weird, complete descriptions of the problem are required.
How to test it:
Write a trivial app that stores a string under a registry key
Write a trivial app that reads that string; display all the useful information, including
all the bytes in hex.
Edit the key with the remote registry editor
Run the read-string app again. See what the bytes are. All the bytes.
*****
****
Hmm. It allocates MORE space than required?
No, it does no allocation whatsoever. It tells me how much it thinks I
should have allocated, 20. It returned ERROR_MORE_DATA to assure me that 10
bytes really really weren't enough.
OK, it tells you that YOU should allocate more space than required. Note that there is no
reason to believe that a string is any particular length. And given the remote editor
issue, it is entirely possible that it stored the L"XXXX______" where each _ represents a
space, or something else strange. So read the 20 bytes and examine them. Display them in
hex. See what they are.
Note that none of this has to do with the number of bytes per Italian character, your
finely-worded question. It may have to do with the remote registry editor, a topic
previously concealed.
****
****
How is this bad?
Because I was hoping to validate the contents of that registry value,
instead of just having to take whatever garbage is in there. We've been
told that a value is going to be preloaded there by another supplier
contracted by the OEM, but it's not there yet, so we load it ourselves using
Remote Registry Editor. I've sometimes been observed making typos, and for
some reason I thought it wouldn't be a bad idea to even validate some
settings that I type in myself.
Perhaps it is allowing for the fact that the characters might require
surrogates?
In that case the answer is 4 bytes per Italian character. Case almost
closed. BUT: as mentioned, the discussion has continued a little bit in
the newsgroups where I first posted this question (international and wince).
Someone else tried to duplicate this test using an English language version
of Remote Reigstry Editor, and get an answer of 2 bytes per Italian
character.
So you're suggesting that CE stores Italian characters in a different
format than English characters?
I think the format is the same. Most characters used in England came from
Italy, just as most characters used in Japan came from China.
It makes a lot of sense to let it estimate high, since the NUL terminator
I already counted it starting from day 1 of this thread:
C O M 7 nul
five characters, a buffer which I sized at ten bytes (not eight), and
according to RegQueryValueEx, twenty bytes (not sixteen).
The only valid way to handle reading a registry key is to allocate a buffer the size
required by the registry key information, read the data into that, and use that data. I
would never presume that a key was 5 characters; I would query the registry, allocate the
required bytes, read the key value, and then worry about interpreting it.
Just because you would like to think you put 4 characters (plus NUL) in the Registry does
not mean that you ACTUALLY have 4 characters (plus NUL), and since you didn't bother to
actually discover what IS there, you don't really have a basis here for complaining. And
if you follow the correct protocols, which is to allocate a buffer as big as required, and
read into that buffer, it should work.
In the absence of having actually discovered what is coming in, you should not be so
resentful of the fact that reality doesn't match your belief about what reality ought to
be. Perhaps you should be complaining about the remote registry editor, and not about
what is actually contained in the registry. Or assume that there is a bug in CE; it might
be accurately reporting what is erroneously placed in the registry by another program.
If you read all the bytes and display them you might find something interesting, perhaps
even useful.
joe
****
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- Follow-Ups:
- Re: How many bytes per Italian character?
- From: Norman Diamond
- Re: How many bytes per Italian character?
- References:
- Re: How many bytes per Italian character?
- From: Joseph M . Newcomer
- Re: How many bytes per Italian character?
- From: Norman Diamond
- Re: How many bytes per Italian character?
- From: Joseph M . Newcomer
- Re: How many bytes per Italian character?
- From: Norman Diamond
- Re: How many bytes per Italian character?
- From: Joseph M . Newcomer
- Re: How many bytes per Italian character?
- From: Norman Diamond
- Re: How many bytes per Italian character?
- From: Joseph M . Newcomer
- Re: How many bytes per Italian character?
- From: Norman Diamond
- Re: How many bytes per Italian character?
- Prev by Date: Re: Common controls 6 in the debug build?
- Next by Date: Re: CStringT conversion issue
- Previous by thread: Re: How many bytes per Italian character?
- Next by thread: Re: How many bytes per Italian character?
- Index(es):
Relevant Pages
|