Re: MultibyteToWideChar not working properly?
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Thu, 24 Apr 2008 16:47:18 -0400
See below...
On Thu, 24 Apr 2008 12:14:10 -0700 (PDT), "ALEKS!" <aleksander.morgado@xxxxxxxxx> wrote:
Hi all,****
I am trying to use MultiByteToWideChar function to detect invalid
encoded strings (in US-ASCII 7-bit for example). The input string I
use is UTF-8 encoded E2:82:AC (euro sign), which is not a valid
ASCII-7 string.
The call is as follows:
char *input_data = "\342\202\254"; /* octal */
If you have data that you think of in hex, converting it to octal is a bit roundabout; why
not write
"\xE2\x82\xAC"
?
****
long n_out = MultiByteToWideChar((UNIT)20127, MB_ERR_INVALID_CHARS,****
input_data, 3, NULL, 0);
What is 20127? Some random magical number? Perhaps a comment that this is US-ASCII 7-bit
code page, or a #define or static const UINT, would have helped...
****
****
When passing NULL in lpWideCharStr and 0 in cchWideChar I am querying
the function to get the number of the output wide chars after
conversion. What I expected was a return of 0 in n_out, as the input
string is not a valid ASCII-7 character, but the output I get is 3,
what means that 3 wide chars are obtained.
That's probably right.
****
****
If I pass a valid output place to store the output string, I get this
UTF-16LE string: 62:00:02:00:2C:00, which seems the string that should
be obtained when only reading the 7bits of each input byte. But: why
is the MSB not read?
Probably because you told it not to read it! You DID say that it is 7-bit data, so it
used only the low-order 7 bits. It will NOT treat what you clearly told it as 7-bit data
as UTF-8 encoding. So it is doing precisely the correct thing, using precisely the data
you told it to use.
****
Why don't I get a ERROR_NO_UNICODE_TRANSLATION or****
such? Why does it work?
It works because it is supposed to. It is doing what you asked.
If your input string is encoded in UTF-8, then the ONLY code page you can use for the
translation is CP_UTF8. You will convert it to Unicode.
Now, you can ask it to conver the Unicode back to 20127 (US-ASCII 7-bit), and if these is
an illegal character, it will indicate that there is a problem, because
WideCharToMultibyte will set the LPBOOL parameter to indicate that there was a translation
error.
****
****
The idea is to detect if a given string is valid in a given encoding,
not only ASCII-7.
You can't ask it to treat UTF-8 as ASCII-7 and expect that it will translate correctly. It
will do exactly what you asked, which is to treat the input string as a sequence of 7-bit
ASCII bytes, which it does by ignoring the high-order bit (which is probably the parity
bit).
By the way, I tried the technique I suggest above using my Locale Explorer (which you can
download from my MVP Tips site; just select the MultiByte tab) and it returns ? for the
result. If you set the lpUsedDefault radio button to "variable" it will actually tell you
it set this value to TRUE
joe
****
Joseph M. Newcomer [MVP]
Thanks for the help in advance.
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- Follow-Ups:
- Re: MultibyteToWideChar not working properly?
- From: Giovanni Dicanio
- Re: MultibyteToWideChar not working properly?
- From: ALEKS!
- Re: MultibyteToWideChar not working properly?
- References:
- MultibyteToWideChar not working properly?
- From: ALEKS!
- MultibyteToWideChar not working properly?
- Prev by Date: What has been apparent among all of the Bvlgari Company's watches is its amazing ability to constantly be ahead of its time. It constantly is producing timepieces that are amazing designing and new features that no other watch company has displayed. And because of this, it has stayed in the light of the public and is growing every year in popularity.
- Next by Date: Re: Type mismatch error when opening word doc using automation
- Previous by thread: MultibyteToWideChar not working properly?
- Next by thread: Re: MultibyteToWideChar not working properly?
- Index(es):