Re: regular expression help
- From: "Jeremy" <nospam@xxxxxxxxxx>
- Date: Wed, 6 Feb 2008 14:15:36 -0800
Ahhhhhh! Ok. Thanks, still tring to wrap my head around Regex.
"Jesse Houwing" <jesse.houwing@xxxxxxxxxxxxxxxx> wrote in message
news:21effc903e99b8ca366eeaf04166@xxxxxxxxxxxxxxxxxxxxx
Hello Jeremy,
Thanks for the regex.
If you perform a match, you will still get 5 matches though on text
such as
1,4,"32,760.00"
You will get "1", "", "4", "", "32.760.00"
I don't understand why it returns 2 empty strings.. any ideas?
Basically because if you remove everything that is optional in the regex
below you end up with an empty regex:
\s*(?:(?<word>"[^"]*")|(?<!")(?<word>[^,"]*)(?<!"))\s*
\s* is optional
(?:(?<word>"[^"]*")|(?<!")(?<word>[^,"]*)(?<!")) is optional because one
of the parts of the alteration (namely (?<!")(?<word>[^,"]*)(?<!") is
optional)
\s* is optional
So the regex engine will try to match on every character in the string:
1 stop, first match is found
, comma doesn't match, but the nothingness behind it does.
4 stop, third match is found
, comma doesn't match, but the nothingness behind it does.
"32,760.00" and the last match is found.
You can ueasily demonstrate this by replacing instead of matching. Replace
the matches with something like $ and all the places where a match are
found are visible:
using the above regex and $$ as replacement pattern you will get the
following result:
The issue can be solved in two ways:
1) make sure there are no optional parts:
\s*(?:(?<word>"[^"]*")|(?<!")(?<word>[^,"]+)(?<!"))\s*
2) match the content of what you want to match even more closely within
the pattern of beginnign of the line, comma separated values, end of the
line:
(?<=^|,)\s*("(?<word>[^"]*)"|(?<word>[^",]*?))\s*(?=,|$)
This last regex also removes any whitespace around the non quoted values
and removes the quotes from quoted values.
Jesse
--
Code:
System.Text.RegularExpressions.MatchCollection pMatches =
System.Text.RegularExpressions.Regex.Matches(strText, strRegex,
System.Text.RegularExpressions.RegexOptions.ExplicitCapture);
foreach (System.Text.RegularExpressions.Match pMatch in pMatches)
{
System.Text.RegularExpressions.Group pGroup =
pMatch.Groups["word"];
string strValue = pGroup.Value;
}
"Kevin Spencer" <unclechutney@localhost> wrote in message
news:O5MCTjzYIHA.1532@xxxxxxxxxxxxxxxxxxxxxxx
Use the following:
\s*(?:(?<word>"[^"]*")|(?<!")(?<word>[^,"]*)(?<!"))\s*
Rather than splitting, it captures all of the elements without the
commas (and spaces) between them. The way it works is this:
It uses a non-capturing group to indicate that either of the two
choices may be preceded and followed by 0 or more spaces. This
eliminates preceding and following spaces from the groups.
It will capture one of two options:
A quote followed by any sequence of characters that is not a quote,
followed by a quote.
Any sequence of characters that is NOT preceded by a quote and does
not
contain either quotes or commas, and is NOT followed by a quote.
The group "word" will give you all the matches that you want.
-- HTH,
Kevin Spencer
Chicken Salad Surgeon
Microsoft MVP
"Jeremy" <nospam@xxxxxxxxxx> wrote in message
news:eErTauqYIHA.5208@xxxxxxxxxxxxxxxxxxxxxxx
I created a regular expression to parse a line in a csv file;
(\"(?<word>[^\"]+|\"\")*\"|(?<word>[^,]*))
It is capable of taking a line such as field1,field2,field
3,123.12,"1,234.56" and matching each value between the commas into
the
word group, so I get
field1
field2
field 3
123.12
1,234.56
My problem is that if I perform a split, or match on a string like
1,1,"123.345" I will get 6 matches back instead of 3.
const string strDelimiter =
"(\\\"(?<word>[^\\\"]+|\\\"\\\")*\\\"|(?<word>[^,]*))";
string strText = "1,1,\"12,212.43\"";
string[] strParts =
System.Text.RegularExpressions.Regex.Split(strText ,
strDelimiter,System.Text.RegularExpressions.RegexOptions.Compiled |
System.Text.RegularExpressions.RegexOptions.ExplicitCapture);
System.Text.RegularExpressions.MatchCollection pMatches =
System.Text.RegularExpressions.Regex.Matches(strText ,
strDelimiter,System.Text.RegularExpressions.RegexOptions.ExplicitCap
ture);
Split returns 13 values, as shown below, and Matches returns 6
items.
How can I just extract the 3 items?
strParts {Dimensions:[13]} string[]
[0] "" string
[1] "1" string
[2] "" string
[3] "" string
[4] "," string
[5] "1" string
[6] "" string
[7] "" string
[8] "," string
[9] "30,478.50" string
[10] "" string
[11] "" string
[12] "" string
Jesse Houwing
jesse.houwing at sogeti.nl
.
- Follow-Ups:
- Re: regular expression help
- From: Kevin Spencer
- Re: regular expression help
- From: Jesse Houwing
- Re: regular expression help
- References:
- Re: regular expression help
- From: Jeremy
- Re: regular expression help
- From: Jesse Houwing
- Re: regular expression help
- Prev by Date: Re: regular expression help
- Next by Date: Re: regular expression help
- Previous by thread: Re: regular expression help
- Next by thread: Re: regular expression help
- Index(es):
Relevant Pages
|