Re: How many bytes per Italian character?

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



In a general case where I don't know the expected length of the string, yes I know how to ask Windows CE how big a buffer I'm going to need. In some cases I might still verify the type that comes back to see if it's really a string or not, but in the general case I know how to inspect the retorted result to find out what type it's registered as today.

Notice that the above is not the question.

In this specific case I knew how long a string I was expecting, UTF-16 and MSDN gave me some misleading inkling of how many bytes it should take, and I was expecting to do a minimal amount of validation. Any nonzero return value and any length other than 10 bytes could have been treated as an error. Instead I have to do extra garbage in order to be compatible with broken Windows.

This seems to be a lot of concern over a fine point;

Fine. Look at the subject line. I posted a finely worded question to begin with. Fine, neither you nor I nor anyone else knows the answer, but why bother shifting the question domain to a question which is really irrelevant to this thread?


"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in message news:fsmj13t6vkuegtsflg4dklrnimtgtn1hch@xxxxxxxxxx
You ask the Registry how many bytes are required, and you allocate that number of bytes.
Depending on your application (Unicode or ANSI) you will get different answers.

Simplistically, since the name is in the U0000..U00FF range, you need 10 bytes to hold 5
Unicode characters. If surrogates could be involved, you would no longer have a
correlation between characters and code points.

This seems to be a lot of concern over a fine point; you ask how big a buffer you need,
you allocate it, you fill it, and if it is a string, there should be a NUL character at
the end of the string, and that determines how many code points you need.
joe

On Mon, 9 Apr 2007 11:12:45 +0900, "Norman Diamond" <ndiamond@xxxxxxxxxxxxxxxx> wrote:

"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in message
news:bioc13lk3u38idqh6idp35skdt38bptm4c@xxxxxxxxxx
On Fri, 6 Apr 2007 22:33:53 +0900, "Norman Diamond"
<ndiamond@xxxxxxxxxxxxxxxx> wrote:
"MrAsm" <mrasm@xxxxxxx> wrote in message
news:tm7c13dfq7b973pll4d1refk3f567uebuo@xxxxxxxxxx
On Fri, 6 Apr 2007 17:15:56 +0900, "Norman Diamond"
<ndiamond@xxxxxxxxxxxxxxxx> wrote:

My value in the registry is a string, "COM6", 5 characters long:
C O M 6 nul
How many bytes does this need? Let's ask Windows CE 5. Better not do
this too soon after eating though.

To repeat, these were 5 characters, and how many bytes do they need?

dwRegQueryValueExBug has value ERROR_MORE_DATA.
dwType has value REG_SZ.
dwSize has value 20.
szBuffer contains 5 characters: C O M 6 nul

So, do they use UTF-32 internally, so each character is 4 bytes, and we
have 5*4 = 20 bytes?

Even if they do, why would the result try to store 20 bytes?
****
If you ask how big the value needs to be,(buffer pointer of NULL), it
gives you a value. You allocate that amount of space. That gives the
correct size buffer.

To repeat, these were 5 characters, and how many bytes do they need?

But this would be a very strange thing: where is that documented?

The part of MSDN which is 75% accurate is the part which is documented.
No one knows the accuracy rate of the part which isn't documented.
****
Actually, the issue of "accurate" is not the same as "complete". And
there are days I think 75% is high for "complete"

Thank you for agreeing with me on that tangential point.

And moreover why the final buffer contains just the 5 chars?

Good question. I assumed that the final buffer contains just the 5
wchar_ts because I only provided that much space. I passed a parameter
saying 10 bytes (10 chars in C/C++ speak). But how do we know whether the
API overwrote an additional 10 bytes of my memory? It returned
ERROR_MORE_DATA and a count of 20, so how do we know what else it did?
Yeah, good question.
****
10 bytes is not 10 characters in C/C++ speak. 10 bytes is 10 bytes.

Yes. I expected 5 characters to take 10 bytes. To repeat, these were 5
characters, and how many bytes do they need?

Characters are an interpretation of bytes, but the Registry APIs work in
terms of bytes.

Yes, that's why my code was written the way it was.

'char' is a misnomer,

Yes, because of historical reasons. I'll cut off my tongue to avoid saying
that C invented characters before China did, because that would detract from
the problem here.

because it really is 'signed byte',

No. Each implementation gets to decide whether plain char is a signed byte
or an unsigned byte. Programs have to prepare for either possibility. But
again, this detracts from the problem here.

Any idea why 20 bytes is the right size for 5 single-codepoint UTF-16
characters?
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

.



Relevant Pages

  • Re: Playing an .avi file in a C# app.
    ... The Windows Media Player control does have a x64 version on Windows XP x65 ... String command, ... StringBuilder buffer = new StringBuilder; ... Microsoft Online Community Support ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Delphi Quiz: SetLength( WideString, 10 );
    ... >> I call a function and the function returns a buffer of bytes. ... Let's assume it's a 16 bit unicode string. ... characters to a wide character encoding scheme such as Unicode. ...
    (alt.comp.lang.borland-delphi)
  • Re: Why crash ?
    ... Yes, but naively, I would assume that the optimal buffer size ... depended on a certain number of characters, ... small string optimization is that the std::string object has a certain ... If you allocate dynamic storage, the string object has to store ...
    (microsoft.public.vc.language)
  • Re: Why crash ?
    ... Yes, but naively, I would assume that the optimal buffer size ... depended on a certain number of characters, ... small string optimization is that the std::string object has a certain ... If you allocate dynamic storage, the string object has to store ...
    (microsoft.public.vc.language)
  • Re: How many bytes per Italian character?
    ... yes I know how to ask Windows CE how big a buffer I'm going to need. ... and characters in Italian are the same size as characters in English, ... So have you detected that WinCE will cause a buffer overrun? ... It is not an error to tell you a string might be longer than it actually ...
    (microsoft.public.vc.mfc)