Re: How many bytes per Italian character?



"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in message news:bioc13lk3u38idqh6idp35skdt38bptm4c@xxxxxxxxxx
On Fri, 6 Apr 2007 22:33:53 +0900, "Norman Diamond" <ndiamond@xxxxxxxxxxxxxxxx> wrote:
"MrAsm" <mrasm@xxxxxxx> wrote in message
news:tm7c13dfq7b973pll4d1refk3f567uebuo@xxxxxxxxxx
On Fri, 6 Apr 2007 17:15:56 +0900, "Norman Diamond"
<ndiamond@xxxxxxxxxxxxxxxx> wrote:

My value in the registry is a string, "COM6", 5 characters long:
C O M 6 nul
How many bytes does this need? Let's ask Windows CE 5. Better not do
this too soon after eating though.

To repeat, these were 5 characters, and how many bytes do they need?

dwRegQueryValueExBug has value ERROR_MORE_DATA.
dwType has value REG_SZ.
dwSize has value 20.
szBuffer contains 5 characters: C O M 6 nul

So, do they use UTF-32 internally, so each character is 4 bytes, and we have 5*4 = 20 bytes?

Even if they do, why would the result try to store 20 bytes?
****
If you ask how big the value needs to be,(buffer pointer of NULL), it gives you a value. You allocate that amount of space. That gives the correct size buffer.

To repeat, these were 5 characters, and how many bytes do they need?

But this would be a very strange thing: where is that documented?

The part of MSDN which is 75% accurate is the part which is documented. No one knows the accuracy rate of the part which isn't documented.
****
Actually, the issue of "accurate" is not the same as "complete". And there are days I think 75% is high for "complete"

Thank you for agreeing with me on that tangential point.

And moreover why the final buffer contains just the 5 chars?

Good question. I assumed that the final buffer contains just the 5 wchar_ts because I only provided that much space. I passed a parameter saying 10 bytes (10 chars in C/C++ speak). But how do we know whether the API overwrote an additional 10 bytes of my memory? It returned ERROR_MORE_DATA and a count of 20, so how do we know what else it did? Yeah, good question.
****
10 bytes is not 10 characters in C/C++ speak. 10 bytes is 10 bytes.

Yes. I expected 5 characters to take 10 bytes. To repeat, these were 5 characters, and how many bytes do they need?

Characters are an interpretation of bytes, but the Registry APIs work in terms of bytes.

Yes, that's why my code was written the way it was.

'char' is a misnomer,

Yes, because of historical reasons. I'll cut off my tongue to avoid saying that C invented characters before China did, because that would detract from the problem here.

because it really is 'signed byte',

No. Each implementation gets to decide whether plain char is a signed byte or an unsigned byte. Programs have to prepare for either possibility. But again, this detracts from the problem here.

Any idea why 20 bytes is the right size for 5 single-codepoint UTF-16 characters?

.



Relevant Pages

  • Re: remove/replace non-ascii characters from file
    ... I suppose you mean "non-graphic ASCII". ... Well, I don't know that much about the ASCII *definition*, but if I open the file in Window$ notepad, these characters appear as additional spaces. ... So, if you are right, that means that M$ notepad converts these NUL characters to spaces, which is a bad thing, if these are indeed different characters and useful for anything. ... any ASCII stream without changing the meaning of the stream. ...
    (Debian-User)
  • Re: String manipulation program not returning expected output
    ... But I'm getting others with jumbled characters. ... int m=0; ... There are two errors in the malloc of smaller. ... need an additional character for the nul. ...
    (comp.lang.c)
  • Bug in mbstowcs() in DJGPPs C library
    ... the end of the buffer and write a NUL. ... If I compile this file using DJGPP, ... converts the input string into wide characters, ... n of them into the output buffer, and _if_ it sees a NUL byte in the ...
    (comp.os.msdos.djgpp)
  • Re: Characters in Fortran 77
    ... Instead there are these <nul> characters. ... Can you try the program with a different compiler? ... but it can help fix the blame. ...
    (comp.lang.fortran)
  • Re: ---What this mean
    ... > I can't find a guarantee in the standard that all white space ... > characters have positive values (plain char might be signed). ...
    (comp.lang.c)