Re: converting std::basic_string to upper or lower case.

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Thanks

"Tom Widmer" <tom_usenet@xxxxxxxxxxx> wrote in message
news:OL0qojJVFHA.3184@xxxxxxxxxxxxxxxxxxxxxxx
> Andy Coates wrote:
>> Apologies in advance on this one - I realise this is probably a common
>> question - but I can't seem to find a solution on the news groups.
>>
>> How would you convert a basic_string to upper-case in-place? In the past
>> I've always done it like so:
>>
>> std::string str1 = "My-/$%^Mixed_Case_String";
>> std::transform(str1.begin(), str1.end(), str1.begin(), ::toupper);
>>
>> However, a colleague of mine has just piped up to say he's had issues
>> with this where he was passing a mixed case string in and some of the
>> upper case characters were coming back messed up. He then quoted the
>> MSDN spiel on the ascii version of toupper:
>>
>> In order for toupper to give the expected results, __isascii and
>> islower must both return nonzero
>>
>> This suggests that if a character is already upper (i.e. islower returned
>> false) then the results are undefined. Is this right or is the
>> documentation in error? Seems mighty strange to me...
>
> The documentation is misleading at best. All of the is* and to* functions
> have as their domain positive values in the range supported by unsigned
> char, and EOF (and Microsoft appear to further limit this to characters
> 0-127 + EOF). So if you pass a negative char value it isn't going to be
> treated correctly (it's undefined behaviour, actually, unless it happens
> to equal EOF) - you should cast to unsigned char first. So, really you
> should use the locale one you use below.
>
>> One alternative I found was to use:
>> std::use_facet<std::ctype<char>
>> >(std::locale()).toupper(&*str1.begin(), &*str1.end());
>>
>> but this is hardly pleasing on the eye - and I'm not even sure it's
>> guaranteed to work as I can't remember off the top of my head if the
>> memory behind a basic_string is guaranteed to be continuous - I have a
>> nagging suspicions it's not and that only the memory returned from
>> c_str() is.
>
> Your suspicions are correct.
>
> The solution is to write a string toupper function, implementing it
> however you want (e.g. with a for loop, or an algorithm), so you don't
> have to worry about it ever again! Even better, use one that's been
> written for you: http://www.boost.org/doc/html/string_algo.html
>
> Tom


.



Relevant Pages

  • Re: Trim string
    ... should not be applied to ordinary char values. ... the value of the macro EOF. ... are particularly likely to forget about negative characters, ... but not every possible `int' value. ...
    (comp.lang.c)
  • Re: Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?
    ... > Note that if we choose to define EOF as -1, ... > correct EOF-handling code (i.e., has not assumed that EOF is defined ... The characters of interest are ... of type char and is positive. ...
    (comp.lang.c)
  • Re: Comment on trim string function please
    ... If the char type on any machine ... How do you think the characters wound up in a string in the first ... through UCHAR_MAX or else EOF. ... where does "locale" play into this? ...
    (comp.lang.c)
  • Re: Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?
    ... >> The descriptions of the ctype functions all take int values. ... >> that char is converted to int in this case and that if char is signed ... What is EOF for in this context? ... the 'space' characters and so 0 must be the result. ...
    (comp.lang.c)
  • Re: K&R exercise 1-18
    ... char lineis in this context equivalent to char *line ... int c, i; ... while (c!= EOF) ... you are reading characters line by line but your ...
    (comp.lang.c)