Re: RegEx problem



* Martin# wrote, On 28-6-2007 18:40:
Hello,

First, very good and detailed answer! (Got a positive rate from me)

Thank you :)

But I would prefere the string.Split solution that you also presented.
A quick test with a loop and two timestamps will show you why!

I hadn't tested, but my guess is that it's a major difference. Regex can do beautiful things, but isn't the best tool for every problem. As I said before: I'd prefer this solution over the regex one. It's both easier to read, and faster. The only problem is that it doesn't validate the input while the regex would do that for you.

I'm not sure if a int.TryParse would impact the loop you tried enough to make is slower than a regex though, my guess is that it's still faster than a regex.

Jesse

All the best,

and to you.

Jesse



Martin

"Jesse Houwing" wrote:

* jac wrote, On 28-6-2007 17:26:
Hi,


I have problems with following code and don’t find the bug :

// Set [8,9,54]
ArrayList aArray = new ArrayList();
regStr = new Regex(@"\[(?:(\d+)[,]?)*(\d+)\]");
if(text != null && regStr.IsMatch(text))
{
Match m = regStr.Match(text);
GroupCollection groups = m.Groups;
number = 0;
for(int i=1;i < groups.Count;i++)
{
foreach(Capture c in groups[i].Captures)
{
aArray.Add(c.Value.ToString());
number++;
}
}

}

[8,9] : thats working in my aArray I have 8 and 9
[16,5] : OK I have 16 and 5
[16,34] : That is nok I have 3 items in my array 16 and 3 and 4
[16] : that’s is nok I have 2 items in my array 1 and 6

Why m.groups has 3 groups for [16,34]? The same for [16] why m.groups has 2 groups.
I think it must be the last part of my regex expression (\d+). This is one group even if there are more numbers in it. How can I solve this?

Thanks in advance,
jac


\[(?<number>\d+)(?:,(?<number>\d+))*\]

should do the trick. Currently there are too many options as both the , as well as the whole first group are optional (which they're not).

The new expression reads

find a [
find a number (one or more digits)
optionally find a comma followed by a number
repeat optional group if possible
find a ]

both number are captured in the same named group, which makes it easier to extract the values:

Match m = regStr.Match(text);
foreach (Capture c in m.Groups["number"].Captures)
{
aArray.Add(c.Value);
}

number = aArray.Count;

Optionally you could also do a string.Split with '[', ',' and ']' as separator characters which would probably be faster as well. You can instruct string.Split to ignore empty groups.

string[] results = "[16,23,1]".Split(new char[] { ',', '[', ']' }, StringSplitOptions.RemoveEmptyEntries);
int number = results.Length;

I'd prefer this solution over the regex one.

Jesse

.



Relevant Pages

  • Re: Stupid Mistakes
    ... before the loop. ... ...oh, I see, it's a global set by findswfile. ... when you come back to this in six months' time) some searching. ... is a special character in a regex, ...
    (comp.lang.perl.misc)
  • RE: Reading contents of file
    ... keep the regex match as it is i.e. while ... But inside the loop when you a find a match and do relevant things, ... the existence of the matched pattern which is done by the help of bookref ... When I run the following script it goes to unending because it is ...
    (perl.beginners)
  • RE: Reading contents of file
    ... 2)I am not able to print the unmatched data i.e in the above example '45' doesn't match the regex in the while loop so this cannot be printed in the output file. ... the existence of the matched pattern which is done by the help of bookref ...
    (perl.beginners)
  • Re: Slow performance after Conversion C# application to VS 2005
    ... If I take the Regex variable declaration and instantiation out of the ... If I leave it in the loop but don't assign count by commenting the last ... line it also takes around 300 milliseconds. ... so we are not losing time if we ...
    (microsoft.public.dotnet.framework)
  • Re: Was `readdir [...] Comments on JKs script
    ... report on the findings (file name and content of regex that hit). ... So reading only up to the regex hit would ... I was reading lines with a while loop inside findso used a `last' ... the last and quieting the warnings with `no warnings' locally. ...
    (perl.beginners)