Re: Char.IsPunctuation vs. CRT is(w)punct

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance




"Jeff Pek (Autodesk)" <jeff.pek@xxxxxxxxxxxxxxxxxxx> wrote in message
news:ORQer8EBHHA.1780@xxxxxxxxxxxxxxxxxxxxxxx
Thanks, all. This is all good stuff. What I was trying to do was to mimic
the behavior of iswpunct (and therefore the existing code). PInvoking
iswpunct seems reasonable, provided that I know that that DLL is going to
be there.

MSVCRT.DLL has been distributed with recent versions of windows, and service
packs for not-so-recent versions, and it exports all the character
classification functions.


- jp

"Chris Mullins" <cmullins@xxxxxxxxx> wrote in message
news:uY19mXtAHHA.144@xxxxxxxxxxxxxxxxxxxxxxx

I should add there is an open-source C# implementation of stringprep that
part of libidn. This implementation is a bit memory hungry, and not
exactly tuned for optimal performance, but it works.


--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins


"Chris Mullins" <cmullins@xxxxxxxxx> wrote in message
news:u$qVDStAHHA.2276@xxxxxxxxxxxxxxxxxxxxxxx
Once you hit Unicode land, I think determining punctuation is difficult.
There is a good answer though:
Stringprep - http://www.ietf.org/rfc/rfc3454.txt

Stringprep addresses case folding, whitespace, prohibited characters,
bidirectional validity, and normalization form.

An example profile is nameprep, which is how Internationalized Domain
Names work:
http://tools.ietf.org/html/rfc3491

Another example profile is "resourceprep" which is part of the XMPP
standard:
http://www.xmpp.org/internet-drafts/attic/draft-ietf-xmpp-resourceprep-03.html

For example, this profile prohibits all characters in :
Table C.1.2
Table C.2.1
Table C.2.2
Table C.3
Table C.4
Table C.5
Table C.6
Table C.7
Table C.8
Table C.9

It specifies unicode normalication form KC, and that bidirectional
checking must be performed.

--
Chris Mullins

"Jeff Pek (Autodesk)" <jeff.pek@xxxxxxxxxxxxxxxxxxx> wrote in message
news:%233seADaAHHA.5060@xxxxxxxxxxxxxxxxxxxxxxx
Hi all -

A Kb article indicates that Char.IsPunctuation is the "equivalent" of
the CRT's isXpunct (e.g., iswpunct) function in .NET. However, I've
found significant differences in their behaviors. As a test, I ran each
function through the first 1000 or so unicode characters, and found the
results that follow. It identifies that characters for which the 2
functions returned different results, and shows what the .NET method
said. I'm sure there are other differences later on in the character
set.

So far, I haven't seen any documentation regarding the specific
differences. I wonder if anything exists. Thanks for any pointers.

Regards,
Jeff

--------

IsPunctuation mismatch: ! (33). .NET says: True
IsPunctuation mismatch: " (34). .NET says: True
IsPunctuation mismatch: # (35). .NET says: True
IsPunctuation mismatch: $ (36). .NET says: False
IsPunctuation mismatch: % (37). .NET says: True
IsPunctuation mismatch: & (38). .NET says: True
IsPunctuation mismatch: ' (39). .NET says: True
IsPunctuation mismatch: ( (40). .NET says: True
IsPunctuation mismatch: ) (41). .NET says: True
IsPunctuation mismatch: * (42). .NET says: True
IsPunctuation mismatch: + (43). .NET says: False
IsPunctuation mismatch: , (44). .NET says: True
IsPunctuation mismatch: - (45). .NET says: True
IsPunctuation mismatch: . (46). .NET says: True
IsPunctuation mismatch: / (47). .NET says: True
IsPunctuation mismatch: : (58). .NET says: True
IsPunctuation mismatch: ; (59). .NET says: True
IsPunctuation mismatch: < (60). .NET says: False
IsPunctuation mismatch: = (61). .NET says: False
IsPunctuation mismatch: > (62). .NET says: False
IsPunctuation mismatch: ? (63). .NET says: True
IsPunctuation mismatch: @ (64). .NET says: True
IsPunctuation mismatch: [ (91). .NET says: True
IsPunctuation mismatch: \ (92). .NET says: True
IsPunctuation mismatch: ] (93). .NET says: True
IsPunctuation mismatch: ^ (94). .NET says: False
IsPunctuation mismatch: _ (95). .NET says: True
IsPunctuation mismatch: ` (96). .NET says: False
IsPunctuation mismatch: { (123). .NET says: True
IsPunctuation mismatch: | (124). .NET says: False
IsPunctuation mismatch: } (125). .NET says: True
IsPunctuation mismatch: ~ (126). .NET says: False
IsPunctuation mismatch: ­ (161). .NET says: True
IsPunctuation mismatch: > (162). .NET says: False
IsPunctuation mismatch: o (163). .NET says: False
IsPunctuation mismatch:  (164). .NET says: False
IsPunctuation mismatch: ? (165). .NET says: False
IsPunctuation mismatch: Ý (166). .NET says: False
IsPunctuation mismatch:  (167). .NET says: False
IsPunctuation mismatch: " (168). .NET says: False
IsPunctuation mismatch: c (169). .NET says: False
IsPunctuation mismatch: ¦ (170). .NET says: False
IsPunctuation mismatch: ® (171). .NET says: True
IsPunctuation mismatch: ª (172). .NET says: False
IsPunctuation mismatch: - (173). .NET says: True
IsPunctuation mismatch: r (174). .NET says: False
IsPunctuation mismatch: _ (175). .NET says: False
IsPunctuation mismatch: ø (176). .NET says: False
IsPunctuation mismatch: ñ (177). .NET says: False
IsPunctuation mismatch: ý (178). .NET says: False
IsPunctuation mismatch: 3 (179). .NET says: False
IsPunctuation mismatch: ' (180). .NET says: False
IsPunctuation mismatch: æ (181). .NET says: False
IsPunctuation mismatch:  (182). .NET says: False
IsPunctuation mismatch: ú (183). .NET says: True
IsPunctuation mismatch: , (184). .NET says: False
IsPunctuation mismatch: 1 (185). .NET says: False
IsPunctuation mismatch: § (186). .NET says: False
IsPunctuation mismatch: ¯ (187). .NET says: True
IsPunctuation mismatch: ¬ (188). .NET says: False
IsPunctuation mismatch: « (189). .NET says: False
IsPunctuation mismatch: _ (190). .NET says: False
IsPunctuation mismatch: ¨ (191). .NET says: True
IsPunctuation mismatch: x (215). .NET says: False
IsPunctuation mismatch: ö (247). .NET says: False
IsPunctuation mismatch: ; (894). .NET says: True
IsPunctuation mismatch: ? (903). .NET says: True










.



Relevant Pages

  • Re: Char.IsPunctuation vs. CRT is(w)punct
    ... Once you hit Unicode land, I think determining punctuation is difficult. ... Stringprep addresses case folding, whitespace, prohibited characters, ... IsPunctuation mismatch: ". ...
    (microsoft.public.dotnet.framework.clr)
  • Re: Char.IsPunctuation vs. CRT is(w)punct
    ... Stringprep - http://www.ietf.org/rfc/rfc3454.txt ... Stringprep addresses case folding, whitespace, prohibited characters, ... An example profile is nameprep, ... IsPunctuation mismatch: ". ...
    (microsoft.public.dotnet.framework.clr)
  • Re: Char.IsPunctuation vs. CRT is(w)punct
    ... the behavior of iswpunct. ... Stringprep addresses case folding, whitespace, prohibited characters, ... An example profile is nameprep, ... IsPunctuation mismatch: ". ...
    (microsoft.public.dotnet.framework.clr)
  • Re: Char.IsPunctuation vs. CRT is(w)punct
    ... this is one important aspect of accomplishing that. ... as punctuation are NOT punctuation. ... some of the weird control characters. ... IsPunctuation mismatch: ". ...
    (microsoft.public.dotnet.framework.clr)
  • Re: Char.IsPunctuation vs. CRT is(w)punct
    ... So are you just trying to clear all the junk out of a string, ... some of the weird control characters. ... IsPunctuation mismatch: ". ...
    (microsoft.public.dotnet.framework.clr)