Re: Which RegEx Testing Tool Do You Prefer?
- From: "clintonG" <csgallagher@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 12 Oct 2005 17:13:29 -0500
Hello Kevin,
With exception of bass ackward Lookarounds, when you say expressions are
sequential in nature that is exactly what I am trying to nail down.
What confuses me is they appear to be constructed using set theory like
algabraic expressions whose terms are resolved from the inner most set
working outwards yet when a tool processes them they appear to be parsed
linearly as if serialized. Does that make sense to you?
I sure wish Microsoft would publish a document explaining how the
expressions are parsed the same way they have explained how the life cycle
of a page is processed. The control tree is easy to understand
theoretically. I can only wonder why it can not be similarly mapped and
explained for regular expressions.
I've tried the following:
* RegexDesigner.Net
* Rad Regex Designer
* Regular Expression Tester
* Regulator RegEx
* Expresso
Downloading this afternoon...
* Regex Coach [1] donationware
Which is also said to have an English language 'analyzer' like Expresso
which I consider to have the best interface and functionality but still
can't make a genius from a dummy :-).
I'm going to delve into some lists and forums [2] for the next week to see
what I can learn.
This book [3] sounds interesting. I'll take a trip to Barnes and Noble to
see if its on the shelves.
I also have this book [4] which is very good despite the "10 Minutes"
inferences and of course I'll be looking for [5] from O'Rielly.
Let's see what that does for now.
<%= Clinton Gallagher
[1] http://www.weitz.de/regex-coach/
[2] http://regexadvice.com/
[3]
http://books.slashdot.org/article.pl?sid=05/03/22/0810243&tid=156&tid=192&tid=6
[4] http://forta.com/books/0672325667/
[5] http://www.oreilly.com/catalog/regex2/index.html?CMP=ILL-4GV796923290
I appreciate your interest in this context. I've discovered a regular
expression mailint list eminates from
"Kevin Spencer" <kevin@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:uHdYdA2zFHA.1192@xxxxxxxxxxxxxxxxxxxxxxx
> Hi Clinton,
>
> Regular Expressions are a bear to learn, ieven if you have good tools to
> work with them. I've spent hours working out a relatively "simple" one (at
> least it seemed simple at first), but learning a bit more with each hour.
> Still, I'm a long way from an expert. I can read most of it fairly well by
> now, but certain concepts are still a bit difficult to deal with. I still
> struggle some with Lookarounds in particular. One thing to keep in mind is
> that Regular Expressions consume a string as they move through it, with a
> few exceptions (like Lookarounds). They are basically sequential in
> nature.
>
> You may find the "Analyze" tool helpful with this sort of thing.
> Fortunately, I have not 2 but THREE Regular Expression tools to work with
> (2 of them are Freeware), which enables me to use the one(s) that are best
> for the particular type of work I need regarding any individual Regular
> Expression and/or problem with one.
>
> The expression you posted,
>
> \w*@\w*\.\w*((\.\w*)*)?
>
> Can be analyzed in so many words as (with the parsing of the email address
> where the match begins):
>
> Match any word character, zero or more times. \w* someone
> Next, Match the '@' character once. @ @
> Next match any word character zero or more times \w* somewhere
> Next, Match the '.' character once \.
> .
> Next, Match any word character zero or more times \w* com
> Next, put the following into Group 1 zero or 1 time: (......)?
> Match the following into Group 2 zero or more times: (......)*
> Match the '.' character once \.
> Match any word character zero or more times \w*
> Result of Group 1: (\.\w*)* Group 2 (Nothing)
> Result of Group 2 \.\w* Nothing
>
> Basically, there is no match for either Group 1 or Group 2, as the '.' has
> been consumed by the previous Match. However, as both Groups specify a
> minimum of Zero times, they don't disqualify the Match, as they appear
> zero times each.
>
> Why does Expresso report Group 1 at position 32 (end of string)? Well, no
> match has been returned prior to the end of the string. So, that's where
> the null match begins. Why does Expressio begin at position 0? Well, I'm
> not that good with it!
>
> Still, your regular expression is a bit lax in terms of standards. We
> worked one up for valid email addresses the other day, and you may want to
> borrow it:
>
> (?i)([-.\w]+)\@(?:((?:\d{1,3}\.){3}\d{1,3})|([-a-z0-9]+(?:\.[-a-z0-9]+)*)\.((?:com|edu|gov|int|mil|net|org|biz|info|name|museum|coop|aero|[a-z]{2})))
>
> It is case-insensitive, and matches both domain name and IP domain email
> addresses. It puts the results into 4 possible groups:
>
> 1-User Name, 2-Domain IP Address, 3-Domain Name, 4-Root Domain.
>
> Note that groups 2 and (3,4) are exclusive of one another. The email
> address can either be an IP address, or a named domain, but not both. It
> supports 2-letter country suffixes, and multiple-dot domain addresses. And
> it's case-sensitive.
>
> I'm not sure we covered all the possible permutations, but it's pretty
> strong.
>
> --
> HTH,
>
> Kevin Spencer
> Microsoft MVP
> .Net Developer
> Ambiguity has a certain quality to it.
>
>
> Basically, the whole string has been consumed by the
> "clintonG" <csgallagher@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
> news:%2333SYS0zFHA.3720@xxxxxxxxxxxxxxxxxxxxxxx
>> Hello Kevin,
>>
>> Well I'm bright eyed but not so bushy-tailed this morning. Thanks for
>> working this out. Its one of those 'must know' issues one needs to be
>> concerned with when generating valid XML from an application. I'll be
>> working with it later today and I'm starting to get a feel for Expresso
>> which I have a question about. I'm at the point where I've almost come to
>> understand how expressions are actually processed which -- for me --
>> means I will understand how I need to think to put them together. You've
>> been a real help again and your source is an inspiration which shows how
>> elegant self-documenting code can be.
>>
>> As for the Expresso question, what is 1:? supposed to indicate? (noting
>> that's the closest I could come at the moment to replicate the
>> rectangular 'non-printable' character Expresso uses to indicate some
>> 'thing' it has matched) In the following simple example it seems to match
>> a white space although in a manner that is confusing as I will point out
>> but in other examples with many more characters and white space in the
>> string to be matched I have counted the position where the ? is said to
>> be matched and the position reported does not fall on a white space at
>> all.
>>
>> // Expression
>> \w*@\w*\.\w*((\.\w*)*)?
>>
>> // String to match
>> An example someone@xxxxxxxxxxxxx of an email address.
>>
>> Expresso reports 1:? at Postion 32 Length 0 which infers white space in
>> the simple example as given noting there was white space characters
>> before the matched characters and motivating one to ask why Expresso
>> would ignore those previous white space characters and then report 2:? at
>> Position 0 Length 0 which suggests the parser returned to the beginning
>> of the string to be matched and found what?
>>
>> Is this clear as mud or what :-)
>>
>> <%= Clinton Gallagher
>>
>>
>> "Kevin Spencer" <kevin@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
>> news:%23hNIR6nzFHA.2884@xxxxxxxxxxxxxxxxxxxxxxx
>>> Hi Clinton,
>>>
>>> The following Regular Expression will give you the ability to do a
>>> Regex.Replace on a string containing both single "&" characters and
>>> "&" strings. It captures the "&" strings into their own separate
>>> matches, and the "&" characters into their own matches, putting the "&"
>>> characters into a Group. It is also case-insensitive:
>>>
>>> (?i)[^&][^&]*|&|(&(?!=amp))
>>>
>>> Here's some sample code for reeplacing the single "&" characters with
>>> & -
>>>
>>> /// <summary>
>>> /// Replaces Ampersand in a Match with "&"
>>> /// </summary>
>>> /// <param name="m">Match</param>
>>> /// <returns>Replaced Match value</returns>
>>> public static string ampReplacer(Match m)
>>> {
>>> if (m.Groups[1].Captures.Count == 0) return m.Value;
>>> return m.Value.Replace("&", "&");
>>> }
>>>
>>> /// <summary>
>>> /// Replaces all single Ampersand characters in a string with "&"
>>> /// </summary>
>>> /// <param name="s">String to process</param>
>>> /// <returns>Processed String</returns>
>>> public static string ReplaceAmpersand(string s)
>>> {
>>> return Regex.Replace(s, @"(?i)[^&][^&]*|&|(&(?!=amp))",
>>> new MatchEvaluator(ampReplacer));
>>> }
>>>
>>> The "ampReplacer function is the function passed as the MatchEvaluator
>>> delegate in the Regex.Replace() method used in the "ReplaceAmpersand"
>>> method. The "ReplaceAmpersand" method takes a string as an argument, and
>>> uses Regex.Replace to replace all matches in the string that contain a
>>> value in Groups[1] with "&".
>>>
>>> As a side note, I used both Expresso and Regex Buddy to come up with
>>> this. It was indeed a challenge, as I'm not quite a master of Regular
>>> Expressions. But I enjoy learning, so it was a good exercise for me! :)
>>>
>>> --
>>> HTH,
>>>
>>> Kevin Spencer
>>> Microsoft MVP
>>> .Net Developer
>>> Ambiguity has a certain quality to it.
>>>
>>> "clintonG" <csgallagher@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
>>> message news:%234B4aHgzFHA.904@xxxxxxxxxxxxxxxxxxxxxxx
>>>> Kevin, have you ever heard the expression "preaching to the choir?" :-)
>>>>
>>>> I've got the basic pattern matching theory understood but its the use
>>>> of expressions to disallow or replace certain characters and/or strings
>>>> that I'm trying to really understand thoroughly. The following example
>>>> illustrates...
>>>>
>>>> // Example
>>>> Lawn Mowers, Repairs & Services - lawnmowers.com
>>>>
>>>> A typical page title that when entered into a TextBox meant to capture
>>>> string data for an RSS 2.0 title element should use & instead of
>>>> the & to represent the ampersand. I've got an expression that works
>>>> well for the example but can't figure out (with the expression I have)
>>>> how to match the & and replace it with & (yet) -- or -- how to use
>>>> the expression I have to force the 2.0 Regular Expression Validator to
>>>> fail when the & is present in the string.
>>>>
>>>> // Expression
>>>> [a-z]+([a-z0-9-]*[a-z0-9]+)?(\.([a-z]+([a-z0-9-]*[a-z0-9]+)?)+)*
>>>>
>>>> I also really appreciate Expresso's Analyzer. It is outstanding that
>>>> Expresso seems to make it easy for us to pick expressions apart piece
>>>> by piece and explain them in English.
>>>>
>>>>
>>>> <%= Clinton Gallagher
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> "Kevin Spencer" <kevin@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
>>>> news:Ow%23z$7azFHA.1264@xxxxxxxxxxxxxxxxxxxxxxx
>>>>> Hi Juan,
>>>>>
>>>>>> The kind of RegEx tool I'd like is one which can take a string
>>>>>> I write, and create a RegEx expression which matches it.
>>>>>
>>>>> The problem with that is that you can write a Regular Expression that
>>>>> matches a literal string quite easily. For example:
>>>>>
>>>>> literal string
>>>>>
>>>>> The above is a regular expression which will match the substring
>>>>> "literal string" in my first sentence. Of course, the real power of
>>>>> regular expressions is the abilty to match *patterns* in a string,
>>>>> perform grouping, etc. So, like any programming language (which it is,
>>>>> in a sense), Regular Expressions have a shorthand syntax that allows
>>>>> one to create patterns of a large variety of types. A simple example
>>>>> of this would be:
>>>>>
>>>>> (literal) (string)
>>>>>
>>>>> This captures the same match as the first, but puts the string
>>>>> "literal" into a group, and the string "string" into a second group.
>>>>> But of course, we have already exceeded your desired requirement. On
>>>>> the other hand, we have made a regular expression that is perhaps more
>>>>> useful (in some situations) than the first.
>>>>>
>>>>> And of course, the possible types and combinations of patterns are
>>>>> almost endless, including wildcard patterns, special characters,
>>>>> boolean rules, and so on.
>>>>>
>>>>> Yeah, it's like reading some kind of incredibly concise shorthand
>>>>> code, without even line breaks or brackets to help. That's why I was
>>>>> so pleased to see that Expresso allows you to break your regular
>>>>> expression across multiple lines while building it. That helps a good
>>>>> bit!
>>>>>
>>>>> --
>>>>> HTH,
>>>>>
>>>>> Kevin Spencer
>>>>> Microsoft MVP
>>>>> .Net Developer
>>>>> Ambiguity has a certain quality to it.
>>>>>
>>>>> "Juan T. Llibre" <nomailreplies@xxxxxxxxxxx> wrote in message
>>>>> news:eivH4pazFHA.2880@xxxxxxxxxxxxxxxxxxxxxxx
>>>>>> The kind of RegEx tool I'd like is one which can take a string
>>>>>> I write, and create a RegEx expression which matches it.
>>>>>>
>>>>>> *That* will be the RegEx tool that will corner the market.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Juan T. Llibre, ASP.NET MVP
>>>>>> ASP.NET FAQ : http://asp.net.do/faq/
>>>>>> Foros de ASP.NET en Español : http://asp.net.do/foros/
>>>>>> ======================================
>>>>>> "clintonG" <csgallagher@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
>>>>>> message news:OfUTuiazFHA.1616@xxxxxxxxxxxxxxxxxxxxxxx
>>>>>>> Thanks Kevin. I saw that post too and am going to download Expresso
>>>>>>> in a few minutes. I know you don't need to be psychic to figure out
>>>>>>> what I'm likely to be asking next :-)
>>>>>>>
>>>>>>> <%= Clinton Gallagher
>>>>>>>
>>>>>>>
>>>>>>> "Kevin Spencer" <kevin@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
>>>>>>> news:O0evUEazFHA.2792@xxxxxxxxxxxxxxxxxxxxxxx
>>>>>>>>I saw a response to this question in the CSharp group, regarding a
>>>>>>>>product named "Expresso"
>>>>>>>>
>>>>>>>> http://www.ultrapico.com/Expresso.htm
>>>>>>>>
>>>>>>>> Expresso is .Net freeware, and after downloading, installing, and
>>>>>>>> playing with it, I'd give it a try! So far I have found it to be
>>>>>>>> excellent, having capabilities that Regex Buddy does not have, and
>>>>>>>> a much more intuitive GUI.
>>>>>>>>
>>>>>>>> --
>>>>>>>> HTH,
>>>>>>>>
>>>>>>>> Kevin Spencer
>>>>>>>> Microsoft MVP
>>>>>>>> .Net Developer
>>>>>>>> Ambiguity has a certain quality to it.
>>>>>>>>
>>>>>>>> "Kevin Spencer" <kevin@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
>>>>>>>> news:%23bxsOlZzFHA.1032@xxxxxxxxxxxxxxxxxxxxxxx
>>>>>>>>> Hi Clinton,
>>>>>>>>>
>>>>>>>>> Yes, I have it. I previously used the freeware Regex Coach
>>>>>>>>> Utility, but it is nowhere near as complete in its support for
>>>>>>>>> various newer Regular Expression syntax and programming languages
>>>>>>>>> in general. It did have one nice feature about it. You could split
>>>>>>>>> a Regular Expression across multiple lines, which often made it
>>>>>>>>> easier to analyze. However, Regex Buddy has the graphical tree
>>>>>>>>> view, and it is synchronized with the Regular Expression itself,
>>>>>>>>> which more than makes up for the omission of breaking a Regular
>>>>>>>>> Expression across multiple lines.
>>>>>>>>>
>>>>>>>>> BTW, it also has a GREP utility built in.
>>>>>>>>>
>>>>>>>>> In short, it is well worth the 30 bucks.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> HTH,
>>>>>>>>>
>>>>>>>>> Kevin Spencer
>>>>>>>>> Microsoft MVP
>>>>>>>>> .Net Developer
>>>>>>>>> Ambiguity has a certain quality to it.
>>>>>>>>>
>>>>>>>>> "clintonG" <csgallagher@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
>>>>>>>>> message news:%23yIAzKVzFHA.3660@xxxxxxxxxxxxxxxxxxxxxxx
>>>>>>>>>>I was looking at PowerGrep from the same dev group but like Regex
>>>>>>>>>>Buddy I don't like the buy before you try business model so that
>>>>>>>>>>choice has to be on the shelf for the moment but thanks for
>>>>>>>>>>bringing it up. I assume you've used Regex Buddy?
>>>>>>>>>>
>>>>>>>>>> <%= Clinton Gallagher
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "Kevin Spencer" <kevin@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
>>>>>>>>>> message news:%23$hJGuTzFHA.664@xxxxxxxxxxxxxxxxxxxxxxx
>>>>>>>>>>> Regex Buddy is very good. It costs around $30.00, includes quite
>>>>>>>>>>> a few nice features, including the ability to copy regular
>>>>>>>>>>> expressions in various language string syntaxes, including C#.
>>>>>>>>>>> It has the ability to create libraries of regular expressions, a
>>>>>>>>>>> nice visual builder, color-coding, and quite a bit more. Good
>>>>>>>>>>> testing environment. And it has some nice reference material
>>>>>>>>>>> included.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> HTH,
>>>>>>>>>>>
>>>>>>>>>>> Kevin Spencer
>>>>>>>>>>> Microsoft MVP
>>>>>>>>>>> .Net Developer
>>>>>>>>>>> Ambiguity has a certain quality to it.
>>>>>>>>>>>
>>>>>>>>>>> "clintonG" <csgallagher@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote
>>>>>>>>>>> in message news:%23e%23mQdTzFHA.2792@xxxxxxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>> I'm using an .aspx tool I found at [1] but as nice as the
>>>>>>>>>>>> interface is I think I need to consider using others. Some can
>>>>>>>>>>>> generate C# I understand. Your preferences please...
>>>>>>>>>>>>
>>>>>>>>>>>> <%= Clinton Gallagher
>>>>>>>>>>>>
>>>>>>>>>>>> [1] http://forta.com/books/0672325667/
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
.
- References:
- Re: Which RegEx Testing Tool Do You Prefer?
- From: Kevin Spencer
- Re: Which RegEx Testing Tool Do You Prefer?
- From: clintonG
- Re: Which RegEx Testing Tool Do You Prefer?
- From: Kevin Spencer
- Re: Which RegEx Testing Tool Do You Prefer?
- From: clintonG
- Re: Which RegEx Testing Tool Do You Prefer?
- From: Kevin Spencer
- Re: Which RegEx Testing Tool Do You Prefer?
- Prev by Date: Re: Subscribing to event from Template Column
- Next by Date: Re: oop advice needed
- Previous by thread: Re: Which RegEx Testing Tool Do You Prefer?
- Next by thread: Odd viewstate problem with Datagrid template columns
- Index(es):
Relevant Pages
|