Re: RegularExpressions



Hi Serge,

I see. Well, parsing it is likely to add some memory to the equation, but
you could read it in blocks if necessary. I still think a Regular Expression
would not be the way to go, though. Regular Expressions do some
backtracking, and I think that wouldn't be necessary. How about if you read
a block (or the whole) into an array of characters? You could then move
through the array one character at a time. The sequence would be a loop (in
pseudo-code):

Start at the beginning of the string, or at the first line break character
or sequence ("\r\n" or '\r' - depending on the document type).Read one
character at a time.
Find the character 'B'.
See if it is followed by 'R'.
If so, see if it is followed by an 'E'.
If all 3 are found in a row, read to the next line break.

Basically, that is what a regular expression does, but in a more roundabout
fashion, with backtracking, etc., because it is not looking for literal
characters, but for patterns. Since you're looking for literal characters in
a specific sequence, this solution would be faster, especially if you used a
pointer.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.

"Serge" <Serge@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:85214A0F-A301-4E06-BA7A-A3D1EBD43200@xxxxxxxxxxxxxxxx
Thank you for your response. I already tried using it on a PER LINE basis.
It
just takes too long. But HUGE file I mean about 10-50 MB. The example I
showed is not the actual file. I was just trying to get a sense of what
pattern to use as RegEx is a lot fast string manipulation then IndexOf,
Substring thing. Basically here are 2 lines from the actual log file.


3148:48 BRE: 17 13 51: penelope baraz King-Ice1 12:21AM
3148:48 ALD: 17 13 51: penelope baraz King-Ice1 12:26AM
3148:48 BRE: 17 13 51: penelope baraz King-Ice1 12:34AM
3148:48 LLD: 17 13 51: penelope baraz King-Ice1 12:45AM

As you can see I have 4 lines in this example but i want to extract lines:

BRE: 17 13 51: penelope baraz King-Ice1 12:21AM
BRE: 17 13 51: penelope baraz King-Ice1 12:34AM

Notice that before the word BRE there are some other info that I dont
want.

Looping through all lines takes too much time however it takes only a sec
or
2 to read it into a string variable so I dont think its a problem.

Thank you very much.
"Kevin Spencer" wrote:

Hi Serge,

I'm a little confused by your first and second example. You mentioned
something about not wanting the "first letter," and in your first
example,
the lines with "BRE" in them all started with a single letter, but the
other
lines did not, and in your second example, the lines did *not* start with
a
first letter.

Be that as it may, I know you're chomping at the bit to use regular
expressions here, but in this case you don't want to use a Regular
Expression, even though it would be easy enough to write. Why? Because
you
said "I have a huge file." Regular Expressions work with strings, and I
don't think that (1) you want to read a "huge file" into a single string,
and (2) use a regular expression on a string that large.

In fact, from what you've described about the size of the file, and
wanting
to parse by line, your best bet (IMHO) would be to use a TextReader to
read
the file one line at a time, and use String.IndexOf to evaluate whether
or
not to include that line in your results. You could, for example, use a
single character array to read the lines into one at a time.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.

"Serge" <Serge@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:A5EFF452-BB34-44CB-8477-7FAE1A3EC505@xxxxxxxxxxxxxxxx
Thats the thing...Im asking for a RegularExpression pattern. I know i
can
just loop through all lines and use IndexOf and Substring...but I have
a
huge
file and it will take forever. That is why Im asking if anyone has more
experience with RegularExpressions cause it is new to me.

Also yes, I said I want to extract all lines that start with BRE. So in
my
example I want lines

BRE asdd asd dfddf
BRE errt ssdrr
BRE AAA asdd

Notice how I dont want the first character (or it could be more than 1
character) I just want to get the line that starts with BRE to the end
of
the
line.

Thank you very much for your time

Serge


"Sanjib Biswas" wrote:

In your example, you said line starts with BRE but at the end you said
you
want lines 2,4, and 6. So I am assuming you mean to say line
containing
BRE.
In that case after you have open the log file, read a line and do a
string
match to see whether that line contains BRE and if its true then store
that
line into an array or collection.

Regards
Sanjib

"Serge" <Serge@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:501E40F3-F275-457B-9956-440D1324C7A8@xxxxxxxxxxxxxxxx
Hello,

I have a log file that looks something like this

ABCDEF ddfs adasd
A BRE asdd asd dfddf
EROI DFIOU eeroo
B BRE errt ssdrr
AAA eIR DFDF
C BRE AAA asdd

All lines are seperated by NEWLINE (\r\n) I want to extract lines
that
start
with BRE all the way to the end of the line and put them into a
collection
or
an array. So in this case I want line 2,4,6

Does any of you RegularExpressions gurus have an idea?

Thank you









.



Relevant Pages

  • Small regular expression parser
    ... the goal was to develop a very simple regular expression parser. ... sets are selected using the % character instead of \. ... into the string of the start of the match and the length of the match. ... Last there are a couple macros to help with captures. ...
    (comp.lang.lisp)
  • Re: Brian Kernighan, maybe Im not worthy, maybe Im scum
    ... abandoned it because I don't think Beautiful Code can be written in C: ... Pike's code doesn't implement a regular expression interpreter ... it makes no provision for a character which must occur at ... changeable index into the string it points-at and it expects the user ...
    (comp.programming)
  • Re: RegularExpressions
    ... Match the letters "BRE" followed by zero or more characters that are not ... followed by a line break or the end of the string. ... I still think a Regular ... through the array one character at a time. ...
    (microsoft.public.dotnet.framework)
  • Re: Checking last character of string for punctuation
    ... I'm a newbie with a newbie question. ... the string should be kept as is. ... This matches your string against the regular expression that you need to put ... Given the description of your problem, you might be interested in character ...
    (perl.beginners)
  • Re: RegularExpressions
    ... Thank you Kevin ... Start at the beginning of the string, or at the first line break character ... Basically, that is what a regular expression does, but in a more roundabout ... Notice that before the word BRE there are some other info that I dont ...
    (microsoft.public.dotnet.framework)