Re: Proposal to extend documentation about interop

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance

From: John Allberg (smudasmuda_at_nospam.nospam)
Date: 10/21/04


Date: Thu, 21 Oct 2004 14:02:57 +0200

You're right, I'm probably using one of those, whatever is the default for
"string". Since this is within a structure within an array I can't use a
simple array of bytes (like what the conversion to UTF-8 returns), I have to
use "String".

Frankly, I have no idea what default conversion is used for "string", but it
is received in C as char*.

I don't care if Win32 has support for UTF-8, I have already done the
conversion in .Net and the dll I'm calling has native support for UTF-8.
That's not the issue.

My point is that when reading about p/invoke in MSDN, there is no
information about that there is a character conversion. The only thing about
character set is the DllImportAttribute.CharSet, but that is more about what
entrypoint to use.

I still think that the documentation should be updated to clearly state that
there is a character set conversion during automatic p/invoke from Unicode
to whatever the Windows station is using.

Regards,

John Allberg

"Robert Jordan" <robertj@gmx.net> wrote in message
news:cl43uh$5u4$03$1@news.t-online.com...
> John Allberg wrote:
>
>> Hi!
>>
>> I think the MSDN docs about interop doesn't state clearly enough that
>> there is a character encoding conversion automaticly done from Unicode to
>> the characterset for the computer during interop.
>>
>> This got me really puzzled for a few days.
>>
>> I've got a legacy C-application (dll) that takes in UTF-8 encoded strings
>> in an array of structs. I call that C-dll from C# which works fine, as
>> long as I use ansi characters, for example english. When sending in
>> swedish characters (where the utf-8 encoding becomes two bytes) such as
>> åäö the lowercase works fine, but the uppercase ÅÄÖ simply comes out as
>> invalid utf-8 encoding of the character FF.
>>
>> I solved it by doing the conversion of UTF-8 to bytes and when going back
>> to string used the encoder for "Default" and converting those bytes to a
>> "unicode-string". That way, when the interop converts the string from
>> Unicode to "Default", the UTF-8 once again surfaces.
>>
>> So my suggestion is to update the MSDN doc to state this conversion
>> clearly enough.
>
> The automatic p/invoke conversion can be applied only to those
> legacy types:
>
> - LPSTR (ansi encoding, 1 byte)
> - LPWSTR (unicode encoding. 2 bytes)
> - LPTSTR (platform specific, one of the above)
> - BSTR
>
> You cannot properly import UTF-8 because Win32 doesn't support
> UTF-8 for the legacy API either.
>
> bye
> Rob



Relevant Pages

  • Proposal: String::Format::General
    ... It provides format string parsing and output assembly, you provide the code that implements the individual conversion characters. ... Format syntax is kind of a cross between sprintf and strftime, but how close it is to each of these depends on the semantics implemented by the user. ... Note that the following is pre-alpha documentation; the interface to the output conversion code has changed since yesterday, ... conversion character, and contain a number of optional fields which may ...
    (comp.lang.perl.modules)
  • Re: what if (f)printf returns EINTR ?
    ... vsnprintf - formatted output conversion ... int fprintf; ... write to the character string str. ...
    (comp.unix.programmer)
  • Re: Unicode Emails vom Server als HTML files sichern oder so aehnlich..
    ... nicht UTF-8. ... ignoring text in character set `ISO-2022-JP' ... The returned string is in internal perl string representation and has ...
    (de.comp.lang.perl.misc)
  • Re: Defacto standard string library
    ... string manipulation code works as well and correctly with UTF-8 ... multibyte character strings as it does with ASCII strings. ... sequence is 0xC2 (when encoding character value 0x80). ...
    (comp.lang.c)
  • Re: Language features worth proposing
    ... Some directive to disallow implicit external procedures. ... CHARACTER() intrinsic to convert strings. ... partly because string conversion is 'fuzzy'. ...
    (comp.lang.fortran)