Re: CString and UTF-8
- From: "Sunray" <Sunray@xxxxxxxxxxxxxxxxx>
- Date: Tue, 19 Dec 2006 15:53:24 -0000
"David Wilkinson" <no-reply@xxxxxxxxxxxx> wrote in message
news:ej7sV%231IHHA.536@xxxxxxxxxxxxxxxxxxxxxxx
Sunray wrote:
It does, CString will try to understand the multibyte characters as its
based upon the CRT which does that. I refer the reader to the
internationalisation section of MSDN which I spent all day reading
yesterday. If it didn't why would my code work in English version of
Windows and not in the Japanese version.
The locale it is in determines how this behaves. I suggest you try
installing a machine that isn't a standard locale and see how CString
behaves before saying what you have said. My code example does *exactly*
what you are saying it doesn't do. CString treats the UTF-8 characters
as multibyte characters, there are two chinese characters in there.
Unfortunately because of the code page the way it does this is wrong
causing it to miss the quotes. What its doing is only useful if I am
trying to process just a multibyte string. What I am processing is a
ASCII string with UTF-8 embedded into it, delimited with quotes, the
UTF-8 does not have these in, its a pre-condition, so it trying to do
this is irrelevant and clearly unhelpful in this instance.
Sunray:
Not sure what you are saying here. Does CString::GetLength() in ANSI build
just return the number of bytes or not? In my understanding, it does,
regardless of whether the string is a valid UTF-8 string, local MBCS
string, or invalid. [Likewise in Unicode mode it just returns the number
of wchar_t's, not the number of characters.]
I must say I was always very confused by this, because the CString
documentation somehow implies that CString is MBCS-aware. But apart from
conversion constructors to UTF-8 and esoteric features of
CString::Compare() I don't think it is.
David Wilkinson
It always appears to return the length in bytes of the string. It wouldn't
make sense to do anything else because you'd lose information if you wanted
to process the chars yourself.
What seems to be a problem for me is when I use the Find function on a 932
code page machine with UTF-8 characters in the string. It appears to be
searching through the string in a MBCS way and with the string of characters
I provided (which I know is missing the terminator) it's expecting Japanese
characters which means it skips the quotes. Perhaps its expecting another
byte to complete a Japanese character. The character set it is using for
this is not UTF-8. If it was, that string comprises valid UTF-8 chinese
characters.
Setting the LC_CTYPE for the code page to 1252 and then getting the
multibyte code page stops this behavour. I'm hoping that this will solve a
lot of problems for me.
Alex
.
- Follow-Ups:
- Re: CString and UTF-8
- From: David Wilkinson
- Re: CString and UTF-8
- References:
- Re: CString and UTF-8
- From: MrAsm
- Re: CString and UTF-8
- From: Joseph M . Newcomer
- Re: CString and UTF-8
- From: MrAsm
- Re: CString and UTF-8
- From: Sunray
- Re: CString and UTF-8
- From: David Wilkinson
- Re: CString and UTF-8
- Prev by Date: Re: prepared sql statement
- Next by Date: Re: Protected constructor/destructors
- Previous by thread: Re: CString and UTF-8
- Next by thread: Re: CString and UTF-8
- Index(es):
Relevant Pages
|
Loading