Re: Replace special characters

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



On Mar 13, 7:44 pm, "Alex K. Angelopoulos" <aka(at)mvps.org> wrote:
Actually, it looks like this thing centers around the categories as defined
by VBScript's regular expression engine.

I don't know a great deal about Unicode - just some things from mucking
around with high-order special characters a few years ago. What you're
saying is actually rooted in the regular expression engine, not the LCID -
although it could come out to the same thing if a particular regex engine
uses arbitrary OS APIs to classify characters.

I did some testing, using the following code in WSH to see what characters
are considered "word characters" by VBScript:

    set rx = new regexp: rx.Pattern = "\w"
    for i = 128 to 65535
        c = ChrW(i)
        If rx.Test(c) Then WScript.Echo i,c
    next

The answer is pretty whacky - exactly 1 character is a letter, the following
one (if you can see it):
    I
That's the "Latin Capital Letter I with Dot Above", which has a character
code of 304 (0x130).

If you read the documentation, the definition for the \w sequence says
    "Matches any word character including underscore. Equivalent to
'[A-Za-z0-9_]'."
so it looks like it's roughly correct, with the exception of the
dotted-capital I.

In any case, what it apparently comes down to is that the VBScript regex
engine is NOT going to work for arbitrary high character sets. This is going
to require an engine that does support Unicode for a robust solution.

"Paul Randall" <paul...@xxxxxxx> wrote in message

news:uvNGNf2oJHA.1172@xxxxxxxxxxxxxxxxxxxxxxx



Hi, Alex
I'm thinking that too.  I don't know Unicode very well, but I'm thinking
that a particular Unicode code point might be a special character in one
LCID but not in some other LCID.  Or maybe it is not LCID dependent.
Knowing some code point/LCID combinations would make it easy to look them
up in the code charts at Unicode.org.

-Paul Randall

"Alex K. Angelopoulos" <aka(at)mvps.org> wrote in message
news:O1JGPBxoJHA.1288@xxxxxxxxxxxxxxxxxxxxxxx
Could you post the unicode character codes for a few sample characters
you do want filtered and don't want filtered out so I can check
something? I'm guessing that the character set doesn't match high unicode
characters properly. It's theoretically possible to do filtering for a
range of character codes, but I'm suspicious that you may also have
special characters in those high ranges that you want filtered out and
using a simple range won't work for that. Some example characters might
help me check the possibilities.

"Gabriela" <frohlin...@xxxxxxxxx> wrote in message
news:fb3d5f5e-6962-40b7-a8bb-f8a6c822923c@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On Mar 8, 9:50 pm, Gabriela <frohlin...@xxxxxxxxx> wrote:
On Mar 8, 6:53 pm, "Alex K. Angelopoulos" <aka(at)mvps.org> wrote:

You haven't explained precisely what you mean by "special
characters", but
I'm guessing you mean anything that is not a word character and is
not a
space. For that, try this regular expression:

oreg_exp.Pattern = "[^ \w]"

"Gabriela" <frohlin...@xxxxxxxxx> wrote in message

news:c3a12250-0107-45b8-a580-1668ca7c7119@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Hi,
I am trying to write a function that receives strings from all
unicode
characters - and replaces all special characters (!@#$%^&*()><?....)
with "-", but literals as is.
I've tried to use regular expression with all special chars on
ASCII
table - but it still doesn't cover everything.
I cannot use a "whitelist" (literals that are allowed) instead of a
"blacklist" (special chars NOT allowed) - because I don't know the
letters of all languages I'd like to support (English, Latin,
Chinese,
Arabic...).
Any ideas what I can do?
Thanks,
Gabi.

This is my "black list" regular expression code - but it does not
succeed always...

dim oreg_exp
set oreg_exp = new RegExp
'oreg_exp.Pattern = "[^a-z0-9]"
oreg_exp.Pattern = "([{}\(\)\^$&._%#!@=<>:;,~`'\'
\*\?\/\+\|\[\\\\]|
\]|\-)"
oreg_exp.IgnoreCase = true
oreg_exp.global = true
title = oreg_exp.replace (title,"-")
Set oreg_exp = Nothing

That's all I needed. the little "[^ \w]" - thanks a lot!!

Ohhh, no, this helps only for English chars. When I've tried it with
other language's - it didn't work - they were removed by regular
expression. I need to support all literals in all languages...
Gabi.- Hide quoted text -

- Show quoted text -

Hi,

I can't copy&paste to this post the string that is giving me trouble,
because when I do that, the problematic apostrophes are replaced with
regular special chars, and it fixes the problem.
So look at this link
http://www.grazeit.com/test_str.asp - it contains the problematic
special chars from my DB, with the failed conversion, whereas after
copy&paste the string from DB, and pasting it back in the code -
conversion is OK.
Thanks,
Gabi.
.



Relevant Pages

  • Re: Replace special characters
    ... I don't know a great deal about Unicode - just some things from mucking around with high-order special characters a few years ago. ... What you're saying is actually rooted in the regular expression engine, not the LCID - although it could come out to the same thing if a particular regex engine uses arbitrary OS APIs to classify characters. ...
    (microsoft.public.scripting.vbscript)
  • Re: Regular Expression Function
    ... I want a regular expression to compare sentences and then rate them as ... I have an array with a list of other phrases like so... ... characters will throw things off. ... "In an hour the system will go down for maintenance". ...
    (alt.php)
  • Re: Regular Expression Function
    ... I want a regular expression to compare sentences and then rate them as ... I have an array with a list of other phrases like so.. ... These will be stripped from the input first. ... characters will throw things off. ...
    (alt.php)
  • Re: Expert script (.bat) writers help needed (strip double-quote from string)
    ... Sets or returns the regular expression pattern being searched for. ... Always a RegExp object variable. ... May include any of the regular expression characters defined in the table in the Settings section. ...
    (microsoft.public.windowsxp.help_and_support)
  • Re: Replace special characters
    ... by VBScript's regular expression engine. ... around with high-order special characters a few years ago. ... I don't know Unicode very well, ...
    (microsoft.public.scripting.vbscript)