Re: Want Input boxes to accept unicode strings on Standard Window

Tech-Archive recommends: Fix windows errors by optimizing your registry



I'm usually starting with something written in terms of 'char' and no sign of _T()
anywhere. I usually manage to avoid the example below simply by making sure there is no
undecorated quoted string anywhere, so what I usually do is just replace all quoted
strings with _T("...") (one regular expression handles this, typically). Then I'm left
with the few strings that had \" in them (I usually don't bother with a more complex
pattern) but these blow up immediately. GetProcAddress then complains, so these are easy
to find.

The *sizeof(TCHAR) and /sizeof(TCHAR) are the hardest to find, but these are usually found
quickly by looking for str() functions, ReadFile, WriteFile, and their aliases. It
usually takes about two days of raw editing before it all compiles. But as I said, the
effort is just about constant, and the reason they send the code to me is that they see it
as a "massive effort" or, in one case, "we need a complete rewrite in Unicode and can't
afford it" (about 150SLOC, five days, and it was only my second conversion, so it was a
bit slower than I can do today, because I was discovering the patterns I needed to worry
about). So what I've learned over the last eight years or so of doing this is that
perception of the complexity is often much higher than the actual complexity.

I've probably done a dozen of these by now, and the only serious glitches are at the
interfaces of reading and writing files and network packets, and there we have to make
decisions about saving files as Unicode or UTF-8, reading Unicode or UTF-8, and ditto for
network transfers.

No, I never got any code to post.
joe
On Tue, 24 Jul 2007 17:39:54 -0700, "Tom Serface" <tom.nospam@xxxxxxxxxxxxx> wrote:

Hi Joe,

I think the effort to do a really big application can be pretty "huge", but
you have to weigh it against the effort of trying to get it to work other
ways. A lot of it depends on how the strings are implemented as well. If
they are mostly in files or the .rc file then it is easy to convert them
(you can even use NotePad). The 2005 version of VS will even use Unicode
.RC files if they are converted first which is very handy. Users will also
have to use the _T() and TCHAR macros which is why we harp on it so much
even when just starting with MBCS. Still, you're right, once you go through
the process a couple of times the conversion thing is pretty academic. The
compiler will gripe about strings that don't have the _T() macro around them
so they are easy to find for the most part. The hardest things are things
like:

CString cs = "My String";

Since CString will try convert the string to MBCS...

I think Giovanni wrote some code to update un-macro'd strings to the _T()
versions. Did you post that on your site?

Tom

"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> wrote in message
news:766da3dava3rt8jimcqb10jkm8bcn8m4hg@xxxxxxxxxx
Are you sure about that "huge effort"? I've had non-Unicode apps cross my
desk with
instructions to convert them to Unicode, and I can usually do it with
about three days of
editing and testing, and get it nearly perfect...and the remaining bugs
are found within a
couple days of testing. I'm talking source here on the order of 60K SLOC,
but the effort
is about the same for 20K SLOC because it is usually just a long set of
very similar
substitution patterns most of which are automated pattern
search-and-replace with my text
editor.
joe
On Tue, 24 Jul 2007 14:14:01 -0700, Paul Wu
<PaulWu@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Thanks for replying. As I said, building it with Unicode takes huge
effort --
not feasible at current stage. We just want the applications to be albe to
process some unicode texts now (on Standard Windows XP).

I looked at the application -- it was built with static MFC libraries
(Visual Studio 2003). So the MFC libraries may not the problem -- I just
don't understand why when it runs on Chinese Windows XP, the Edit Controls
can accept Chinese texts.

Paul
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.



Relevant Pages

  • Re: Sets and portability (was) Re: Is ISO Pascal compatible with J&W (original) Pascal ?
    ... strings, the user can control the length by the data they process; ... >> The computer world is more complex than it's ever been (eg Unicode) ... The Pascal `Char' type can be this size (unlike C, ... > Note that ansi->wide conversion is codepage sensitive. ...
    (comp.lang.pascal.misc)
  • Dangerous behavior of CString
    ... On initial compilation under Unicode, there were several hundred errors, and it took me a couple of days to get rid of them. ... I then started to test my app with strings from different languages, and was surprised to find that in some places the strings were displayed correctly, but in others they were not. ... Thus the implicit conversion constructor prevents the compiler form telling me that my code is not as I intended. ...
    (microsoft.public.vc.mfc)
  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)
  • Re: Copying string to byte array
    ... >> to Unicode using the code page in effect at the time. ... then the conversion cannot be guaranteed transparent. ... Binary data is not even fictional ... VB strings cannot contain arbitrary binary data. ...
    (microsoft.public.vb.general.discussion)
  • Re: Dangerous behavior of CString
    ... If I'm reading a data file or serial port or something, if the raw data are multibyte but the compilation is Unicode or vice-versa, then sometimes the converting constructors in CString are convenient. ... I did not actually write code like this; in fact I was pretty careful always to use the _T macro with any literal strings. ... But it does the conversion using the current 8-bit code page, which is not what I want. ...
    (microsoft.public.vc.mfc)