RE: HowTo? RegEx - pattern to exclude the whole word



Hello,

Have you tried something like this

(sub:(.)*LOT:(.)+?\s)

Let me know if it solved your problem.

--------------------
From: shonend@xxxxxxxxx
Newsgroups: microsoft.public.dotnet.general
Subject: HowTo? RegEx - pattern to exclude the whole word
Date: 8 Feb 2006 11:01:28 -0800
Organization: http://groups.google.com
Lines: 67
Message-ID: <1139423708.272819.33420@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
NNTP-Posting-Host: 63.86.206.3
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
X-Trace: posting.google.com 1139425293 27748 127.0.0.1 (8 Feb 2006
19:01:33 GMT)
X-Complaints-To: groups-abuse@xxxxxxxxxx
NNTP-Posting-Date: Wed, 8 Feb 2006 19:01:33 +0000 (UTC)
User-Agent: G2/0.2
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
NET CLR 1.1.4322),gzip(gfe),gzip(gfe)
Complaints-To: groups-abuse@xxxxxxxxxx
Injection-Info: g43g2000cwa.googlegroups.com; posting-host=63.86.206.3;
posting-account=FAgaQQwAAABNdEbvXlMQ8kQ3uQ0Iry25
Path:
TK2MSFTNGXA01.phx.gbl!TK2MSFTNGP08.phx.gbl!newsfeed00.sul.t-online.de!t-onli
ne.de!news.glorb.com!postnews.google.com!g43g2000cwa.googlegroups.com!not-fo
r-mail
Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.dotnet.general:188343
X-Tomcat-NG: microsoft.public.dotnet.general

I am trying to extract the pattern like this :

"SUB: some text LOT: one-word"

Described, "SUB" and "LOT" are key words; I want those words,
everything in between and one word following the "LOT:". Source text
may contain multiple "SUB: ... LOT:" blocks.

For example this is my source text:

SUB: this text I want to extract LOT: 2345 , something in between, new
SUB: again something I want to extract LOT: 2145 and more text here,
the end

When I apply this pattern:

SUB:\s+[^\r\n]+\s+LOT:\s+[^\r\n\s]+

in .NET's Regex.Matches(...), I only get one match:

SUB: this text I want to extract LOT: 2345 , something in between, new
SUB: again something I want to extract LOT: 2145

Obviously, something in this regex tells it to be "greedy", and I need
the partial matches too.

I thought this pattern would return ALL matches, which are:
1) SUB: this text I want to extract LOT: 2345
2) SUB: again something I want to extract LOT: 2145
3) SUB: this text I want to extract LOT: 2345 , something in between,
new SUB: again something I want to extract LOT: 2145

The last one I don't need of course, but I can handle it - ignore it,
and use only the first two.

So my idea was to modify my pattern to read like this:
give me all matches resembling text between "SUB:" and "LOT:",
including those keywords, plus one word after "LOT:", but (!) the text
between cannot contain "LOT:"

If I manage to compose such RegEx pattern, it would even eliminate the
result 3), and return only what I really need. But the problem is how
to define pattern that will eliminate (exclude) the whole word. I
tried "[^ ... ]" pattern, but that works only for single characters
listed between the brackets.
For example:

SUB:\s+[^\r\n(LOT:)]+\s+LOT:\s+[^\r\n\s]+

is not working. I thought that "( )" brackets would group the
characters and tell the regex not the match the appearance of the whole
word "LOT:". But instead, it invalidates any text that contain any of
these characters:
) ( : L T O

So if you could answer at least one of the following questions, I would
appreciate it very much:

1) generally, how do you compose the regex pattern to not match the
text that contain certain word?
2) if there is no easy solution for 1), or there is a better solution
for the problem I described above, what is it?

Thank you so much!

Shone



--

Thank You,
Nanda Lella,

This Posting is provided "AS IS" with no warranties, and confers no rights.

.


Quantcast