Re: ToUpper() Better solution



The closest thing that comes to mind is an RFC called stringprep. There are
a wide variety of stringprep profiles, and while they don't quite do what
you're looking for, they're close. Included in stringprep is a set of
mapping tables for Uppder->Lower case conversions. These are (in that
context) called case-foldings, are are found in table B.2. Unfortunatly,
they're Upper->Lower, not the other way around.

Stringprep:
http://www.faqs.org/rfcs/rfc3454.html

There are a number of profiles:
[Profile for Internaional Domain Names]
http://www.rfc-editor.org/rfc/rfc3491.txt

[Profile for iSCSI names]
http://tools.ietf.org/html/draft-ietf-ips-iscsi-string-prep-01

[Profile for SASL UserNames & Passwords]
http://www.ietf.org/rfc/rfc4013.txt

[Profile for XMPP Resources]
http://www.xmpp.org/internet-drafts/attic/draft-ietf-xmpp-resourceprep-02.html

There's a C# implementation of this RFC that's part of the libidn library.
There's also a C++ & Java version.
http://www.gnu.org/software/libidn/

We've actually got a full implemention of stringprep as well - it's much
more .Net 2.0 ish than the libidn one, which is just a native C++ app that
was then ported to Java & .Net. It's found in our open-source SoapBox
Framework.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP
http://www.coversant.com/blogs/cmullins

"Jon Skeet [C# MVP]" <skeet@xxxxxxxxx> wrote in message
news:MPG.2050e52a399075e398d906@xxxxxxxxxxxxxxxxxxxxxxx
Ornette <abstrait...nospam...@xxxxxxx> wrote:
So how would you do ?

The mapping table idea you had before looked best to me, although I
wouldn't quite implement it the same way. I'd have a look up table for
every possible character, where it defaults to the Unicode character,
but for all the accented characters you care about, you specify the
non-accented version.

You'd then call ToCharArray() on the string in question, go through
each character replacing the original with the mapped character, and
then create a new string with the char array.

It does require you to manually map all the accented characters you
care about though.

My guess is that there are libraries around to do this somewhere, but I
don't know of any myself.

--
Jon Skeet - <skeet@xxxxxxxxx>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too


.



Relevant Pages

  • RE: Error 1324
    ... but the error only list "Program Files contains an invalid character" ... ... > Important This article contains information about modifying the registry. ... Run Setup Under a Different Profile ...
    (microsoft.public.office.setup)
  • RE: Error 1324
    ... contains an invalid character". ... Important This article contains information about modifying the registry. ... Run Setup Under a Different Profile ... For additional information about removing an existing user profile, ...
    (microsoft.public.office.setup)
  • Re: ToUpper() Better solution
    ... You could do what you want in two steps: decompose the string to base ... [Profile for Internaional Domain Names] ... every possible character, where it defaults to the Unicode character, ... It does require you to manually map all the accented characters you ...
    (microsoft.public.dotnet.general)
  • Re: Jump to new line after the first record is displayed
    ... Maybe you can put a new line character like VBA.Chr. ... while reading the data, when you get a new record, go to next row. ...
    (microsoft.public.excel.programming)