Re: Is the following little function UNICODE-safe? ...

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance



Several problems, see below...
On Wed, 26 Mar 2008 10:32:51 -0700 (PDT), ".rhavin grobert" <clqrq@xxxxxxxx> wrote:

I'm trying to write my code in a way that doesn't makes it very hard
to switch later to UNICODE,
but sometimes i've funcs that will be called very often, so they have
to be *fast*

may i expect that a char[] is correctly translated when this fn is
called in a UNICODE-env or is some _T()-magic required?

//-----------------------------------------------------------------------
// get the default extension associated with the files type
LPCTSTR CInfFile::ExtensionGetDefault() const {
DWORD dwDefID = CFZFile::IdentifierGetDefaults(TypeGet());
****
What is this supposed to mean? Note that in ANSI, this would represent three characters
terminated by a NUL character, but in Unicode, it would represent a single character at
most, and the proper termination does not exist.
****
dwDefID &= 0x00FFFFFF;
****
This clearly makes it obvious that you think you are returning 8-bit characters. This
can't work.
****
return reinterpret_cast<char*>(&dwDefID);
****
This is incorrect. You have a prototype which CLEARLY states the return type is "LPCTSTR"
yet here you both use an incorrect data type (char) and don't have a const cast. The
OBVIOUS correct response would be
return reinterpret_cast(LPCTSTR)(&dwDefID);
but in fact it is FAR worse than this.

You have used a local variable in a function, and you are returning a pointer to a local
variable. The value is COMPLETELY WITHOUT MEANING upon return!

A DWORD can hold at most 3 8-bit characters or 1 16-bit character.

Why do you think file extensions are 3 characters long? What about ".html", ".vcproj" and
similar extensions? The 3-character assumption has been dead for over 15 years.

Then, you have to make sure that the CFZFile::IdentiferGetDefaults(TypeGet()) can ONLY
return 8-bit characters; that is not always a safe assumption in a Unicode environment!
Why is it encoding a string as a DWORD anyway? But if it DOES return 8-bit characters,
why are you using a static and returning a pointer to it? It would make more sense to
write

CString CInFile::ExtensionGetDefault() const {
DWORD dwDefID = CFZFile::IdentifierGetDefaults(TypeGet());
dwDefID &= 0x00FFFFFF;
return CString((LPCSTR)&dwDefID);
}

It is IMPOSSIBLE to return a pointer to a stack variable, but do NOT be tempted to add the
word 'static' to the declaration. This would create a program which would be an example
of worst-possible-programming-methodology, and it would be erroneous in another way.
joe
****

}

thx, -.rhavin;)
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Re: Unicode Support
    ... >> (I know this is a poor example, but think about other languages, eg ... First things first, when you register your RosAsm windows classes, you ... the messages with ANSI / UNICODE parameters in ANSI or UNICODE form... ... with their alphabet characters, as with the numbers and punctuation...so, ...
    (alt.lang.asm)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogate_Al?= =?windows-1252?Q?pha
    ... characters of an exotic eastern language using an ASCII keyboard. ... It is true to say that any keyboard of any language can be simulated ... communicate in large volume with China or Japan using CJK from Unicode ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)