Re: RegularExpressions
- From: "Kevin Spencer" <kevin@xxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 16 Jun 2006 11:30:36 -0400
Hi Serge,
Here you go:
(?m)BRE.*$
Explanation:
"(?m)" means caret (^) and dollar sign ($) match at line breaks.
Match the letters "BRE" followed by zero or more characters that are not
line break (.*), followed by a line break or the end of the string ($).
--
HTH,
Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist
A lifetime is made up of
Lots of short moments.
"Serge" <Serge@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:A66BA6B9-E908-4CA7-A7BE-EC8A1FFD1354@xxxxxxxxxxxxxxxx
Thank you for your response. I will try to do it by using an array of
characters. Meanwhile, could you show me the RegEx patter that I can use?
I
just want to compare the 2 approaches and use the faststest of the 2.
Thank you very much
"Kevin Spencer" wrote:
Hi Serge,
I see. Well, parsing it is likely to add some memory to the equation, but
you could read it in blocks if necessary. I still think a Regular
Expression
would not be the way to go, though. Regular Expressions do some
backtracking, and I think that wouldn't be necessary. How about if you
read
a block (or the whole) into an array of characters? You could then move
through the array one character at a time. The sequence would be a loop
(in
pseudo-code):
Start at the beginning of the string, or at the first line break
character
or sequence ("\r\n" or '\r' - depending on the document type).Read one
character at a time.
Find the character 'B'.
See if it is followed by 'R'.
If so, see if it is followed by an 'E'.
If all 3 are found in a row, read to the next line break.
Basically, that is what a regular expression does, but in a more
roundabout
fashion, with backtracking, etc., because it is not looking for literal
characters, but for patterns. Since you're looking for literal characters
in
a specific sequence, this solution would be faster, especially if you
used a
pointer.
--
HTH,
Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist
A lifetime is made up of
Lots of short moments.
"Serge" <Serge@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:85214A0F-A301-4E06-BA7A-A3D1EBD43200@xxxxxxxxxxxxxxxx
Thank you for your response. I already tried using it on a PER LINE
basis.
It
just takes too long. But HUGE file I mean about 10-50 MB. The example I
showed is not the actual file. I was just trying to get a sense of what
pattern to use as RegEx is a lot fast string manipulation then IndexOf,
Substring thing. Basically here are 2 lines from the actual log file.
3148:48 BRE: 17 13 51: penelope baraz King-Ice1 12:21AM
3148:48 ALD: 17 13 51: penelope baraz King-Ice1 12:26AM
3148:48 BRE: 17 13 51: penelope baraz King-Ice1 12:34AM
3148:48 LLD: 17 13 51: penelope baraz King-Ice1 12:45AM
As you can see I have 4 lines in this example but i want to extract
lines:
BRE: 17 13 51: penelope baraz King-Ice1 12:21AM
BRE: 17 13 51: penelope baraz King-Ice1 12:34AM
Notice that before the word BRE there are some other info that I dont
want.
Looping through all lines takes too much time however it takes only a
sec
or
2 to read it into a string variable so I dont think its a problem.
Thank you very much.
"Kevin Spencer" wrote:
Hi Serge,
I'm a little confused by your first and second example. You mentioned
something about not wanting the "first letter," and in your first
example,
the lines with "BRE" in them all started with a single letter, but the
other
lines did not, and in your second example, the lines did *not* start
with
a
first letter.
Be that as it may, I know you're chomping at the bit to use regular
expressions here, but in this case you don't want to use a Regular
Expression, even though it would be easy enough to write. Why? Because
you
said "I have a huge file." Regular Expressions work with strings, and
I
don't think that (1) you want to read a "huge file" into a single
string,
and (2) use a regular expression on a string that large.
In fact, from what you've described about the size of the file, and
wanting
to parse by line, your best bet (IMHO) would be to use a TextReader to
read
the file one line at a time, and use String.IndexOf to evaluate
whether
or
not to include that line in your results. You could, for example, use
a
single character array to read the lines into one at a time.
--
HTH,
Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist
A lifetime is made up of
Lots of short moments.
"Serge" <Serge@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:A5EFF452-BB34-44CB-8477-7FAE1A3EC505@xxxxxxxxxxxxxxxx
Thats the thing...Im asking for a RegularExpression pattern. I know
i
can
just loop through all lines and use IndexOf and Substring...but I
have
a
huge
file and it will take forever. That is why Im asking if anyone has
more
experience with RegularExpressions cause it is new to me.
Also yes, I said I want to extract all lines that start with BRE. So
in
my
example I want lines
BRE asdd asd dfddf
BRE errt ssdrr
BRE AAA asdd
Notice how I dont want the first character (or it could be more than
1
character) I just want to get the line that starts with BRE to the
end
of
the
line.
Thank you very much for your time
Serge
"Sanjib Biswas" wrote:
In your example, you said line starts with BRE but at the end you
said
you
want lines 2,4, and 6. So I am assuming you mean to say line
containing
BRE.
In that case after you have open the log file, read a line and do a
string
match to see whether that line contains BRE and if its true then
store
that
line into an array or collection.
Regards
Sanjib
"Serge" <Serge@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:501E40F3-F275-457B-9956-440D1324C7A8@xxxxxxxxxxxxxxxx
Hello,
I have a log file that looks something like this
ABCDEF ddfs adasd
A BRE asdd asd dfddf
EROI DFIOU eeroo
B BRE errt ssdrr
AAA eIR DFDF
C BRE AAA asdd
All lines are seperated by NEWLINE (\r\n) I want to extract lines
that
start
with BRE all the way to the end of the line and put them into a
collection
or
an array. So in this case I want line 2,4,6
Does any of you RegularExpressions gurus have an idea?
Thank you
.
- Follow-Ups:
- Re: RegularExpressions
- From: Serge
- Re: RegularExpressions
- References:
- Re: RegularExpressions
- From: Sanjib Biswas
- Re: RegularExpressions
- From: Kevin Spencer
- Re: RegularExpressions
- From: Serge
- Re: RegularExpressions
- From: Kevin Spencer
- Re: RegularExpressions
- From: Serge
- Re: RegularExpressions
- Prev by Date: Re: "Not CLS Compliant" Warning in a Hello World example!
- Next by Date: Re: .NET Frameowrk's interaction with the Windows OS
- Previous by thread: Re: RegularExpressions
- Next by thread: Re: RegularExpressions
- Index(es):
Relevant Pages
|