Re: Char.IsPunctuation vs. CRT is(w)punct

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance




I should add there is an open-source C# implementation of stringprep that
part of libidn. This implementation is a bit memory hungry, and not exactly
tuned for optimal performance, but it works.


--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins


"Chris Mullins" <cmullins@xxxxxxxxx> wrote in message
news:u$qVDStAHHA.2276@xxxxxxxxxxxxxxxxxxxxxxx
Once you hit Unicode land, I think determining punctuation is difficult.
There is a good answer though:
Stringprep - http://www.ietf.org/rfc/rfc3454.txt

Stringprep addresses case folding, whitespace, prohibited characters,
bidirectional validity, and normalization form.

An example profile is nameprep, which is how Internationalized Domain
Names work:
http://tools.ietf.org/html/rfc3491

Another example profile is "resourceprep" which is part of the XMPP
standard:
http://www.xmpp.org/internet-drafts/attic/draft-ietf-xmpp-resourceprep-03.html

For example, this profile prohibits all characters in :
Table C.1.2
Table C.2.1
Table C.2.2
Table C.3
Table C.4
Table C.5
Table C.6
Table C.7
Table C.8
Table C.9

It specifies unicode normalication form KC, and that bidirectional
checking must be performed.

--
Chris Mullins

"Jeff Pek (Autodesk)" <jeff.pek@xxxxxxxxxxxxxxxxxxx> wrote in message
news:%233seADaAHHA.5060@xxxxxxxxxxxxxxxxxxxxxxx
Hi all -

A Kb article indicates that Char.IsPunctuation is the "equivalent" of the
CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found
significant differences in their behaviors. As a test, I ran each
function through the first 1000 or so unicode characters, and found the
results that follow. It identifies that characters for which the 2
functions returned different results, and shows what the .NET method
said. I'm sure there are other differences later on in the character set.

So far, I haven't seen any documentation regarding the specific
differences. I wonder if anything exists. Thanks for any pointers.

Regards,
Jeff

--------

IsPunctuation mismatch: ! (33). .NET says: True
IsPunctuation mismatch: " (34). .NET says: True
IsPunctuation mismatch: # (35). .NET says: True
IsPunctuation mismatch: $ (36). .NET says: False
IsPunctuation mismatch: % (37). .NET says: True
IsPunctuation mismatch: & (38). .NET says: True
IsPunctuation mismatch: ' (39). .NET says: True
IsPunctuation mismatch: ( (40). .NET says: True
IsPunctuation mismatch: ) (41). .NET says: True
IsPunctuation mismatch: * (42). .NET says: True
IsPunctuation mismatch: + (43). .NET says: False
IsPunctuation mismatch: , (44). .NET says: True
IsPunctuation mismatch: - (45). .NET says: True
IsPunctuation mismatch: . (46). .NET says: True
IsPunctuation mismatch: / (47). .NET says: True
IsPunctuation mismatch: : (58). .NET says: True
IsPunctuation mismatch: ; (59). .NET says: True
IsPunctuation mismatch: < (60). .NET says: False
IsPunctuation mismatch: = (61). .NET says: False
IsPunctuation mismatch: > (62). .NET says: False
IsPunctuation mismatch: ? (63). .NET says: True
IsPunctuation mismatch: @ (64). .NET says: True
IsPunctuation mismatch: [ (91). .NET says: True
IsPunctuation mismatch: \ (92). .NET says: True
IsPunctuation mismatch: ] (93). .NET says: True
IsPunctuation mismatch: ^ (94). .NET says: False
IsPunctuation mismatch: _ (95). .NET says: True
IsPunctuation mismatch: ` (96). .NET says: False
IsPunctuation mismatch: { (123). .NET says: True
IsPunctuation mismatch: | (124). .NET says: False
IsPunctuation mismatch: } (125). .NET says: True
IsPunctuation mismatch: ~ (126). .NET says: False
IsPunctuation mismatch: ­ (161). .NET says: True
IsPunctuation mismatch: > (162). .NET says: False
IsPunctuation mismatch: o (163). .NET says: False
IsPunctuation mismatch:  (164). .NET says: False
IsPunctuation mismatch: ? (165). .NET says: False
IsPunctuation mismatch: Ý (166). .NET says: False
IsPunctuation mismatch:  (167). .NET says: False
IsPunctuation mismatch: " (168). .NET says: False
IsPunctuation mismatch: c (169). .NET says: False
IsPunctuation mismatch: ¦ (170). .NET says: False
IsPunctuation mismatch: ® (171). .NET says: True
IsPunctuation mismatch: ª (172). .NET says: False
IsPunctuation mismatch: - (173). .NET says: True
IsPunctuation mismatch: r (174). .NET says: False
IsPunctuation mismatch: _ (175). .NET says: False
IsPunctuation mismatch: ø (176). .NET says: False
IsPunctuation mismatch: ñ (177). .NET says: False
IsPunctuation mismatch: ý (178). .NET says: False
IsPunctuation mismatch: 3 (179). .NET says: False
IsPunctuation mismatch: ' (180). .NET says: False
IsPunctuation mismatch: æ (181). .NET says: False
IsPunctuation mismatch:  (182). .NET says: False
IsPunctuation mismatch: ú (183). .NET says: True
IsPunctuation mismatch: , (184). .NET says: False
IsPunctuation mismatch: 1 (185). .NET says: False
IsPunctuation mismatch: § (186). .NET says: False
IsPunctuation mismatch: ¯ (187). .NET says: True
IsPunctuation mismatch: ¬ (188). .NET says: False
IsPunctuation mismatch: « (189). .NET says: False
IsPunctuation mismatch: _ (190). .NET says: False
IsPunctuation mismatch: ¨ (191). .NET says: True
IsPunctuation mismatch: x (215). .NET says: False
IsPunctuation mismatch: ö (247). .NET says: False
IsPunctuation mismatch: ; (894). .NET says: True
IsPunctuation mismatch: ? (903). .NET says: True






.



Relevant Pages

  • Re: Char.IsPunctuation vs. CRT is(w)punct
    ... Once you hit Unicode land, I think determining punctuation is difficult. ... Stringprep addresses case folding, whitespace, prohibited characters, ... IsPunctuation mismatch: ". ...
    (microsoft.public.dotnet.framework.clr)
  • Re: Char.IsPunctuation vs. CRT is(w)punct
    ... the behavior of iswpunct. ... Stringprep addresses case folding, whitespace, prohibited characters, ... An example profile is nameprep, ... IsPunctuation mismatch: ". ...
    (microsoft.public.dotnet.framework.clr)
  • Re: Char.IsPunctuation vs. CRT is(w)punct
    ... the behavior of iswpunct. ... Stringprep addresses case folding, whitespace, prohibited characters, ... An example profile is nameprep, ... IsPunctuation mismatch: ". ...
    (microsoft.public.dotnet.framework.clr)
  • Re: Char.IsPunctuation vs. CRT is(w)punct
    ... this is one important aspect of accomplishing that. ... as punctuation are NOT punctuation. ... some of the weird control characters. ... IsPunctuation mismatch: ". ...
    (microsoft.public.dotnet.framework.clr)
  • Re: Char.IsPunctuation vs. CRT is(w)punct
    ... So are you just trying to clear all the junk out of a string, ... some of the weird control characters. ... IsPunctuation mismatch: ". ...
    (microsoft.public.dotnet.framework.clr)