Re: Replace special characters
- From: Gabriela <frohlinger@xxxxxxxxx>
- Date: Fri, 13 Mar 2009 23:28:42 -0700 (PDT)
On Mar 13, 7:44 pm, "Alex K. Angelopoulos" <aka(at)mvps.org> wrote:
Actually, it looks like this thing centers around the categories as defined
by VBScript's regular expression engine.
I don't know a great deal about Unicode - just some things from mucking
around with high-order special characters a few years ago. What you're
saying is actually rooted in the regular expression engine, not the LCID -
although it could come out to the same thing if a particular regex engine
uses arbitrary OS APIs to classify characters.
I did some testing, using the following code in WSH to see what characters
are considered "word characters" by VBScript:
set rx = new regexp: rx.Pattern = "\w"
for i = 128 to 65535
c = ChrW(i)
If rx.Test(c) Then WScript.Echo i,c
next
The answer is pretty whacky - exactly 1 character is a letter, the following
one (if you can see it):
I
That's the "Latin Capital Letter I with Dot Above", which has a character
code of 304 (0x130).
If you read the documentation, the definition for the \w sequence says
"Matches any word character including underscore. Equivalent to
'[A-Za-z0-9_]'."
so it looks like it's roughly correct, with the exception of the
dotted-capital I.
In any case, what it apparently comes down to is that the VBScript regex
engine is NOT going to work for arbitrary high character sets. This is going
to require an engine that does support Unicode for a robust solution.
"Paul Randall" <paul...@xxxxxxx> wrote in message
news:uvNGNf2oJHA.1172@xxxxxxxxxxxxxxxxxxxxxxx
Hi, Alex
I'm thinking that too. I don't know Unicode very well, but I'm thinking
that a particular Unicode code point might be a special character in one
LCID but not in some other LCID. Or maybe it is not LCID dependent.
Knowing some code point/LCID combinations would make it easy to look them
up in the code charts at Unicode.org.
-Paul Randall
"Alex K. Angelopoulos" <aka(at)mvps.org> wrote in message
news:O1JGPBxoJHA.1288@xxxxxxxxxxxxxxxxxxxxxxx
Could you post the unicode character codes for a few sample characters
you do want filtered and don't want filtered out so I can check
something? I'm guessing that the character set doesn't match high unicode
characters properly. It's theoretically possible to do filtering for a
range of character codes, but I'm suspicious that you may also have
special characters in those high ranges that you want filtered out and
using a simple range won't work for that. Some example characters might
help me check the possibilities.
"Gabriela" <frohlin...@xxxxxxxxx> wrote in message
news:fb3d5f5e-6962-40b7-a8bb-f8a6c822923c@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On Mar 8, 9:50 pm, Gabriela <frohlin...@xxxxxxxxx> wrote:
On Mar 8, 6:53 pm, "Alex K. Angelopoulos" <aka(at)mvps.org> wrote:
You haven't explained precisely what you mean by "special
characters", but
I'm guessing you mean anything that is not a word character and is
not a
space. For that, try this regular expression:
oreg_exp.Pattern = "[^ \w]"
"Gabriela" <frohlin...@xxxxxxxxx> wrote in message
news:c3a12250-0107-45b8-a580-1668ca7c7119@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hi,
I am trying to write a function that receives strings from all
unicode
characters - and replaces all special characters (!@#$%^&*()><?....)
with "-", but literals as is.
I've tried to use regular expression with all special chars on
ASCII
table - but it still doesn't cover everything.
I cannot use a "whitelist" (literals that are allowed) instead of a
"blacklist" (special chars NOT allowed) - because I don't know the
letters of all languages I'd like to support (English, Latin,
Chinese,
Arabic...).
Any ideas what I can do?
Thanks,
Gabi.
This is my "black list" regular expression code - but it does not
succeed always...
dim oreg_exp
set oreg_exp = new RegExp
'oreg_exp.Pattern = "[^a-z0-9]"
oreg_exp.Pattern = "([{}\(\)\^$&._%#!@=<>:;,~`'\'
\*\?\/\+\|\[\\\\]|
\]|\-)"
oreg_exp.IgnoreCase = true
oreg_exp.global = true
title = oreg_exp.replace (title,"-")
Set oreg_exp = Nothing
That's all I needed. the little "[^ \w]" - thanks a lot!!
Ohhh, no, this helps only for English chars. When I've tried it with
other language's - it didn't work - they were removed by regular
expression. I need to support all literals in all languages...
Gabi.- Hide quoted text -
- Show quoted text -
Hi,
I can't copy&paste to this post the string that is giving me trouble,
because when I do that, the problematic apostrophes are replaced with
regular special chars, and it fixes the problem.
So look at this link
http://www.grazeit.com/test_str.asp - it contains the problematic
special chars from my DB, with the failed conversion, whereas after
copy&paste the string from DB, and pasting it back in the code -
conversion is OK.
Thanks,
Gabi.
.
- Follow-Ups:
- Re: Replace special characters
- From: Gabriela
- Re: Replace special characters
- References:
- Replace special characters
- From: Gabriela
- Re: Replace special characters
- From: Alex K. Angelopoulos
- Re: Replace special characters
- From: Gabriela
- Re: Replace special characters
- From: Gabriela
- Re: Replace special characters
- From: Alex K. Angelopoulos
- Re: Replace special characters
- From: Paul Randall
- Re: Replace special characters
- From: Alex K. Angelopoulos
- Replace special characters
- Prev by Date: Re: Help, Please........
- Next by Date: Re: Replace special characters
- Previous by thread: Re: Replace special characters
- Next by thread: Re: Replace special characters
- Index(es):
Relevant Pages
|