Re: MFC Interview Tests
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Mon, 23 Mar 2009 15:48:05 -0500
Note that in the days of this example, we were still using only 8-bit character strings.
Surrogate pairs pose a huge number of problems, and by the time we start seriously moving
to Win64, we should be thinking about wchar_t being 32 bits (it already is in some
compilers). The problem with surrogate pairs is they give you all the jobs of MBCS all
over again!
joe
On Mon, 23 Mar 2009 17:44:42 +0100, "Giovanni Dicanio"
<giovanniDOTdicanio@xxxxxxxxxxxxxxxxx> wrote:
Joseph M. Newcomer [MVP]
"Joseph M. Newcomer" <newcomer@xxxxxxxxxxxx> ha scritto nel messaggio
news:946cs49312ru1k4e3cv03kb8gibv14b3cl@xxxxxxxxxx
The string-reverse question is trivial
strrev()
works fine. End of discussion.
I think the Unicode version of strrev has problems...
It seems to me that it fails to properly handle Unicode strings that contain
surrogate pairs.
Please see the simple demo project here, and screenshots in the zip file:
http://www.geocities.com/giovanni.dicanio/vc/TestStringRev.zip
For example, this Unicode character made up by surrogate pairs, encoded in
UTF-16 as U+D840 U+DC01, is not properly managed in the reverse process.
If I had to implement strrev(), I would consider a couple of pointers: a
'left' pointer pointing to the beginning of the string, and a 'right'
pointer pointing to the end of the string.
At each loop iteration the characters pointed by the left and right pointers
are swapped.
The left pointer is moved on the right (++), and the right pointer is moved
to the left (--).
The loop continues while left pointer <= right pointer.
I think that this algorithm works in case of ASCII strings, when each
character is stored in one 'char'.
I think that this algorithm also works in case of Unicode UTF-16 strings
without surrogate pairs, when each character is stored in one 'WCHAR'.
But this algorithm fails in case of surrogate pairs... maybe UTF-32 should
be used to make this algorithm work fine in Unicode?
(In fact, I think that in UTF-32 there is no concept of surrogat pair, and
all characters are stored in a 32-bit DWORDs...).
Giovanni
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- References:
- MFC Interview Tests
- From: Tony Montana
- Re: MFC Interview Tests
- From: Brian Muth
- Re: MFC Interview Tests
- From: Colin Peters
- Re: MFC Interview Tests
- From: Joseph M . Newcomer
- Re: MFC Interview Tests
- From: Giovanni Dicanio
- MFC Interview Tests
- Prev by Date: Re: MFC Interview Tests
- Next by Date: Re: include file question
- Previous by thread: Re: MFC Interview Tests
- Next by thread: Re: MFC Interview Tests
- Index(es):
Relevant Pages
|