Re: Char.IsPunctuation vs. CRT is(w)punct
- From: "Chris Mullins" <cmullins@xxxxxxxxx>
- Date: Tue, 7 Nov 2006 17:33:31 -0800
I should add there is an open-source C# implementation of stringprep that
part of libidn. This implementation is a bit memory hungry, and not exactly
tuned for optimal performance, but it works.
--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins
"Chris Mullins" <cmullins@xxxxxxxxx> wrote in message
news:u$qVDStAHHA.2276@xxxxxxxxxxxxxxxxxxxxxxx
Once you hit Unicode land, I think determining punctuation is difficult.
There is a good answer though:
Stringprep - http://www.ietf.org/rfc/rfc3454.txt
Stringprep addresses case folding, whitespace, prohibited characters,
bidirectional validity, and normalization form.
An example profile is nameprep, which is how Internationalized Domain
Names work:
http://tools.ietf.org/html/rfc3491
Another example profile is "resourceprep" which is part of the XMPP
standard:
http://www.xmpp.org/internet-drafts/attic/draft-ietf-xmpp-resourceprep-03.html
For example, this profile prohibits all characters in :
Table C.1.2
Table C.2.1
Table C.2.2
Table C.3
Table C.4
Table C.5
Table C.6
Table C.7
Table C.8
Table C.9
It specifies unicode normalication form KC, and that bidirectional
checking must be performed.
--
Chris Mullins
"Jeff Pek (Autodesk)" <jeff.pek@xxxxxxxxxxxxxxxxxxx> wrote in message
news:%233seADaAHHA.5060@xxxxxxxxxxxxxxxxxxxxxxx
Hi all -
A Kb article indicates that Char.IsPunctuation is the "equivalent" of the
CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've found
significant differences in their behaviors. As a test, I ran each
function through the first 1000 or so unicode characters, and found the
results that follow. It identifies that characters for which the 2
functions returned different results, and shows what the .NET method
said. I'm sure there are other differences later on in the character set.
So far, I haven't seen any documentation regarding the specific
differences. I wonder if anything exists. Thanks for any pointers.
Regards,
Jeff
--------
IsPunctuation mismatch: ! (33). .NET says: True
IsPunctuation mismatch: " (34). .NET says: True
IsPunctuation mismatch: # (35). .NET says: True
IsPunctuation mismatch: $ (36). .NET says: False
IsPunctuation mismatch: % (37). .NET says: True
IsPunctuation mismatch: & (38). .NET says: True
IsPunctuation mismatch: ' (39). .NET says: True
IsPunctuation mismatch: ( (40). .NET says: True
IsPunctuation mismatch: ) (41). .NET says: True
IsPunctuation mismatch: * (42). .NET says: True
IsPunctuation mismatch: + (43). .NET says: False
IsPunctuation mismatch: , (44). .NET says: True
IsPunctuation mismatch: - (45). .NET says: True
IsPunctuation mismatch: . (46). .NET says: True
IsPunctuation mismatch: / (47). .NET says: True
IsPunctuation mismatch: : (58). .NET says: True
IsPunctuation mismatch: ; (59). .NET says: True
IsPunctuation mismatch: < (60). .NET says: False
IsPunctuation mismatch: = (61). .NET says: False
IsPunctuation mismatch: > (62). .NET says: False
IsPunctuation mismatch: ? (63). .NET says: True
IsPunctuation mismatch: @ (64). .NET says: True
IsPunctuation mismatch: [ (91). .NET says: True
IsPunctuation mismatch: \ (92). .NET says: True
IsPunctuation mismatch: ] (93). .NET says: True
IsPunctuation mismatch: ^ (94). .NET says: False
IsPunctuation mismatch: _ (95). .NET says: True
IsPunctuation mismatch: ` (96). .NET says: False
IsPunctuation mismatch: { (123). .NET says: True
IsPunctuation mismatch: | (124). .NET says: False
IsPunctuation mismatch: } (125). .NET says: True
IsPunctuation mismatch: ~ (126). .NET says: False
IsPunctuation mismatch: (161). .NET says: True
IsPunctuation mismatch: > (162). .NET says: False
IsPunctuation mismatch: o (163). .NET says: False
IsPunctuation mismatch: (164). .NET says: False
IsPunctuation mismatch: ? (165). .NET says: False
IsPunctuation mismatch: Ý (166). .NET says: False
IsPunctuation mismatch: (167). .NET says: False
IsPunctuation mismatch: " (168). .NET says: False
IsPunctuation mismatch: c (169). .NET says: False
IsPunctuation mismatch: ¦ (170). .NET says: False
IsPunctuation mismatch: ® (171). .NET says: True
IsPunctuation mismatch: ª (172). .NET says: False
IsPunctuation mismatch: - (173). .NET says: True
IsPunctuation mismatch: r (174). .NET says: False
IsPunctuation mismatch: _ (175). .NET says: False
IsPunctuation mismatch: ø (176). .NET says: False
IsPunctuation mismatch: ñ (177). .NET says: False
IsPunctuation mismatch: ý (178). .NET says: False
IsPunctuation mismatch: 3 (179). .NET says: False
IsPunctuation mismatch: ' (180). .NET says: False
IsPunctuation mismatch: æ (181). .NET says: False
IsPunctuation mismatch: (182). .NET says: False
IsPunctuation mismatch: ú (183). .NET says: True
IsPunctuation mismatch: , (184). .NET says: False
IsPunctuation mismatch: 1 (185). .NET says: False
IsPunctuation mismatch: § (186). .NET says: False
IsPunctuation mismatch: ¯ (187). .NET says: True
IsPunctuation mismatch: ¬ (188). .NET says: False
IsPunctuation mismatch: « (189). .NET says: False
IsPunctuation mismatch: _ (190). .NET says: False
IsPunctuation mismatch: ¨ (191). .NET says: True
IsPunctuation mismatch: x (215). .NET says: False
IsPunctuation mismatch: ö (247). .NET says: False
IsPunctuation mismatch: ; (894). .NET says: True
IsPunctuation mismatch: ? (903). .NET says: True
.
- Follow-Ups:
- Re: Char.IsPunctuation vs. CRT is(w)punct
- From: Jeff Pek \(Autodesk\)
- Re: Char.IsPunctuation vs. CRT is(w)punct
- References:
- Char.IsPunctuation vs. CRT is(w)punct
- From: Jeff Pek \(Autodesk\)
- Re: Char.IsPunctuation vs. CRT is(w)punct
- From: Chris Mullins
- Char.IsPunctuation vs. CRT is(w)punct
- Prev by Date: Re: Char.IsPunctuation vs. CRT is(w)punct
- Next by Date: Memory footprint differences between x86 and x64
- Previous by thread: Re: Char.IsPunctuation vs. CRT is(w)punct
- Next by thread: Re: Char.IsPunctuation vs. CRT is(w)punct
- Index(es):
Relevant Pages
|