Re: Char.IsPunctuation vs. CRT is(w)punct
- From: "Ben Voigt" <rbv@xxxxxxxxxxxxx>
- Date: Fri, 10 Nov 2006 10:41:32 -0600
"Jeff Pek (Autodesk)" <jeff.pek@xxxxxxxxxxxxxxxxxxx> wrote in message
news:ORQer8EBHHA.1780@xxxxxxxxxxxxxxxxxxxxxxx
Thanks, all. This is all good stuff. What I was trying to do was to mimic
the behavior of iswpunct (and therefore the existing code). PInvoking
iswpunct seems reasonable, provided that I know that that DLL is going to
be there.
MSVCRT.DLL has been distributed with recent versions of windows, and service
packs for not-so-recent versions, and it exports all the character
classification functions.
- jp
"Chris Mullins" <cmullins@xxxxxxxxx> wrote in message
news:uY19mXtAHHA.144@xxxxxxxxxxxxxxxxxxxxxxx
I should add there is an open-source C# implementation of stringprep that
part of libidn. This implementation is a bit memory hungry, and not
exactly tuned for optimal performance, but it works.
--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins
"Chris Mullins" <cmullins@xxxxxxxxx> wrote in message
news:u$qVDStAHHA.2276@xxxxxxxxxxxxxxxxxxxxxxx
Once you hit Unicode land, I think determining punctuation is difficult.
There is a good answer though:
Stringprep - http://www.ietf.org/rfc/rfc3454.txt
Stringprep addresses case folding, whitespace, prohibited characters,
bidirectional validity, and normalization form.
An example profile is nameprep, which is how Internationalized Domain
Names work:
http://tools.ietf.org/html/rfc3491
Another example profile is "resourceprep" which is part of the XMPP
standard:
http://www.xmpp.org/internet-drafts/attic/draft-ietf-xmpp-resourceprep-03.html
For example, this profile prohibits all characters in :
Table C.1.2
Table C.2.1
Table C.2.2
Table C.3
Table C.4
Table C.5
Table C.6
Table C.7
Table C.8
Table C.9
It specifies unicode normalication form KC, and that bidirectional
checking must be performed.
--
Chris Mullins
"Jeff Pek (Autodesk)" <jeff.pek@xxxxxxxxxxxxxxxxxxx> wrote in message
news:%233seADaAHHA.5060@xxxxxxxxxxxxxxxxxxxxxxx
Hi all -
A Kb article indicates that Char.IsPunctuation is the "equivalent" of
the CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've
found significant differences in their behaviors. As a test, I ran each
function through the first 1000 or so unicode characters, and found the
results that follow. It identifies that characters for which the 2
functions returned different results, and shows what the .NET method
said. I'm sure there are other differences later on in the character
set.
So far, I haven't seen any documentation regarding the specific
differences. I wonder if anything exists. Thanks for any pointers.
Regards,
Jeff
--------
IsPunctuation mismatch: ! (33). .NET says: True
IsPunctuation mismatch: " (34). .NET says: True
IsPunctuation mismatch: # (35). .NET says: True
IsPunctuation mismatch: $ (36). .NET says: False
IsPunctuation mismatch: % (37). .NET says: True
IsPunctuation mismatch: & (38). .NET says: True
IsPunctuation mismatch: ' (39). .NET says: True
IsPunctuation mismatch: ( (40). .NET says: True
IsPunctuation mismatch: ) (41). .NET says: True
IsPunctuation mismatch: * (42). .NET says: True
IsPunctuation mismatch: + (43). .NET says: False
IsPunctuation mismatch: , (44). .NET says: True
IsPunctuation mismatch: - (45). .NET says: True
IsPunctuation mismatch: . (46). .NET says: True
IsPunctuation mismatch: / (47). .NET says: True
IsPunctuation mismatch: : (58). .NET says: True
IsPunctuation mismatch: ; (59). .NET says: True
IsPunctuation mismatch: < (60). .NET says: False
IsPunctuation mismatch: = (61). .NET says: False
IsPunctuation mismatch: > (62). .NET says: False
IsPunctuation mismatch: ? (63). .NET says: True
IsPunctuation mismatch: @ (64). .NET says: True
IsPunctuation mismatch: [ (91). .NET says: True
IsPunctuation mismatch: \ (92). .NET says: True
IsPunctuation mismatch: ] (93). .NET says: True
IsPunctuation mismatch: ^ (94). .NET says: False
IsPunctuation mismatch: _ (95). .NET says: True
IsPunctuation mismatch: ` (96). .NET says: False
IsPunctuation mismatch: { (123). .NET says: True
IsPunctuation mismatch: | (124). .NET says: False
IsPunctuation mismatch: } (125). .NET says: True
IsPunctuation mismatch: ~ (126). .NET says: False
IsPunctuation mismatch: (161). .NET says: True
IsPunctuation mismatch: > (162). .NET says: False
IsPunctuation mismatch: o (163). .NET says: False
IsPunctuation mismatch: (164). .NET says: False
IsPunctuation mismatch: ? (165). .NET says: False
IsPunctuation mismatch: Ý (166). .NET says: False
IsPunctuation mismatch: (167). .NET says: False
IsPunctuation mismatch: " (168). .NET says: False
IsPunctuation mismatch: c (169). .NET says: False
IsPunctuation mismatch: ¦ (170). .NET says: False
IsPunctuation mismatch: ® (171). .NET says: True
IsPunctuation mismatch: ª (172). .NET says: False
IsPunctuation mismatch: - (173). .NET says: True
IsPunctuation mismatch: r (174). .NET says: False
IsPunctuation mismatch: _ (175). .NET says: False
IsPunctuation mismatch: ø (176). .NET says: False
IsPunctuation mismatch: ñ (177). .NET says: False
IsPunctuation mismatch: ý (178). .NET says: False
IsPunctuation mismatch: 3 (179). .NET says: False
IsPunctuation mismatch: ' (180). .NET says: False
IsPunctuation mismatch: æ (181). .NET says: False
IsPunctuation mismatch: (182). .NET says: False
IsPunctuation mismatch: ú (183). .NET says: True
IsPunctuation mismatch: , (184). .NET says: False
IsPunctuation mismatch: 1 (185). .NET says: False
IsPunctuation mismatch: § (186). .NET says: False
IsPunctuation mismatch: ¯ (187). .NET says: True
IsPunctuation mismatch: ¬ (188). .NET says: False
IsPunctuation mismatch: « (189). .NET says: False
IsPunctuation mismatch: _ (190). .NET says: False
IsPunctuation mismatch: ¨ (191). .NET says: True
IsPunctuation mismatch: x (215). .NET says: False
IsPunctuation mismatch: ö (247). .NET says: False
IsPunctuation mismatch: ; (894). .NET says: True
IsPunctuation mismatch: ? (903). .NET says: True
.
- References:
- Char.IsPunctuation vs. CRT is(w)punct
- From: Jeff Pek \(Autodesk\)
- Re: Char.IsPunctuation vs. CRT is(w)punct
- From: Chris Mullins
- Re: Char.IsPunctuation vs. CRT is(w)punct
- From: Chris Mullins
- Re: Char.IsPunctuation vs. CRT is(w)punct
- From: Jeff Pek \(Autodesk\)
- Char.IsPunctuation vs. CRT is(w)punct
- Prev by Date: Re: How to catch Internet Explorer events?
- Next by Date: Re: Dynamic generic type with base generic type definition
- Previous by thread: Re: Char.IsPunctuation vs. CRT is(w)punct
- Next by thread: !sos.DumpHeap -stat -cache -gen 2 ==> high CPU and PageFaults
- Index(es):
Relevant Pages
|