Re: Regular Expressions in C#

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



I couldn't help but bite on this one. It is a very challenging problem. Here
is your solution:

(?i)(?:(?<function>Write|Read)\s*\()\s*|(?<=(?:(?:Write|Read)\s*\(\s*)|(?:(?:[\d\w]+\s*,\s*)))(?<parameter>[\d\w]+)(?=,\s*|\s*\))

Let me break it down a bit. First, I used (?i) to indicate that it is
non-case-sensitive.
Next, I had the problem of identifying *both* function names and parameters
in the same Regular Expression.

The function name Regular Expression is:

(?:(?<function>Write|Read)\s*\(\s*)

"function" is the name of the capturing group, which captures only the
function name. The rest of the match is to identify it as a function.

It will match only if the function name is "Read" or "Write" and is followed
by an opening parenthesis. I assumed that any token may have any number of
white-space characters before and after it. This was not too tricky.

The second one is a bit trickier:

(?<=(?:(?:Write|Read)\s*\(\s*)|(?:(?:[\d\w]+\s*,\s*)))(?<parameter>[\d\w]+)(?=,\s*|\s*\))

The trick here is to identify a parameter from inside a set of function
parameters.

The rules break down as:

1. A parameter is always preceded by a function name followed by an open
parenthesis, as in:

Write (

2. It may be preceded by another parameter followed by a comma.

Write(param1,

- or -

Write(.......param3,

3. It is always followed by either a comma or an end-parenthesis.

param1,
- or -
param2 )

So, starting with the third rule, we get:

(?<parameter>[\d\w]+)(?=,\s*|\s*\))

"parameter" is the name of the capturing group, which according to these
rules is an alphanumeric token. The rest of it is how the parameter is
matched. It is a positive look-ahead, which means that it *must* be followed
by either a comma or an end parenthesis.

However, the problem here is that *any* word in the string that is not a
function and is followed by a comma or an end parenthesis will match this,
as in:

Read( 0x55, 5 ) <- Write one byte, to (address 0x55)

In this line, "byte," and "(address 0x55)" will match.

So, how do we eliminate non-parameters? Well, obviously, a parameter is
defined as being inside the parentheses of a function call. So, first, use a
positive look-behind to see if it is preceded by a function call. We need to
identify the function, using the same syntax as before:

(?:(?:Write|Read)\s*\(\s*)

However, it may have a parameter before it, instead of the function call. So
we use an OR "|" operator to indicate that it may be preceded by:

(?:(?:[\d\w]+\s*,\s*))

Note that we have changed the rule slightly. Any parameter which precedes
another parameter will *not* be followed by an end-parenthesis. It will
*always* be followed by a comma.

So, we use the Positive Lookbehind syntax (?>=) coupled with an OR operator
("|"), and get:

(?<=(?:(?:Write|Read)\s*\(\s*)|(?:(?:[\d\w]+\s*,\s*)))(?<parameter>[\d\w]+)(?=,\s*|\s*\))

Translated: Match any alphanumeric set of tokens which is followed by either
a comma or an end parenthesis, and is preceded either by a function call or
by another parameter.

Now to put them together, we use the OR operator:

(?i)(?:(?<function>Write|Read)\s*\()\s*|(?<=(?:(?:Write|Read)\s*\(\s*)|(?:(?:[\d\w]+\s*,\s*)))(?<parameter>[\d\w]+)(?=,\s*|\s*\))

The function name will be captured into the "function" group, and all of the
parameters will be captured into the "parameter" group. This could be stated
as:

Match any token that is either "Read" or "Write" followed by an open
parenthesis, and call it "function," OR Match any alphanumeric set of tokens
which is followed by either a comma or an end parenthesis, and is preceded
either by a function call or by another parameter, and call it "parameter."

You sure picked a doozy to start out with!

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

Hard work is a medication for which
there is no placebo.

<LordHog@xxxxxxxxxxx> wrote in message
news:1144962018.113580.94720@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hello all,

I am attempting to create a small scripting application to be used
during testing. I extract the commands from the script file I was going
to tokenize the each line as one of the requirements is there one
command per line. I have always wanted to learn Regular Expressions, so
I was hoping I might do this using Regular Expressions. For a fair
number of the command will have the syntax like

Write( 0x123, 0x12, 25, 100 ) <- Write three bytes to address 0x123
Write(varName1, 0x12) <- Write one bytes to address
expressed by the value of
varName1
Read( 0x55, 5 ) <- Write one bytes to address 0x55
Read(0x3456, 0x12) <- Read eighteen bytes to address
0x3456
varName2 = Read( varName1 ) <- Read one byte from address
expressed by the value of varName1
and store that read value to
varName2


I know if I use the regular expression (^[a-zA-Z]*) will find the
initial keywords or variable names which I can perform an initial check
to make sure they are valid or the variable has been declared already,
but the hard part is creating a regular expression to match the various
forms of the syntax. How would I create a regular express for the first
and last script commands? I think with those I can attempt to determine
the others. The spaces between the arguments are optional and may be
omitted if the user so desires.

For the first script command I was attempting to craft one that looks
like..

(^[a-zA-Z]*)('\(')(['0x',0-9][a-zA-Z]*)(',')(['0x',0-9][a-zA-Z]*)

but this obviously doesn't work. Any help is greatly appreciated.

Mark



.



Relevant Pages

  • Re: Seymour Hersh - New Yorker article
    ... Every Congress-critter who votes in favor of war, every talk-show host, ... do not use a comma ... the clause that is subordinated by "that" ... relies on the part in parenthesis for its sense, ...
    (misc.writing)
  • Re: filipinas for marriage
    ... The best women in asia for marriage, devoted Catholic ladies, as long ... do not use a comma ... the clause that is subordinated by "that" ... relies on the part in parenthesis for its sense, ...
    (misc.writing)
  • Re: Seymour Hersh - New Yorker article
    ... Boo hoo hoo. ... do not use a comma ... the clause that is subordinated by "that" ... relies on the part in parenthesis for its sense, ...
    (misc.writing)
  • Re: Problem finding last parenthetical phrase with Regular Expressions
    ... The coding at the end of the filename represents audio information I ... Not all regular expression systems are the same. ... closing parenthesis followed by a closing parenthesis ')'. ... That will work if your regular expression engines support it. ...
    (comp.sys.mac.system)