Re: Checking if a list of names appears in a body of text.
- From: "Mr. Arnold" <MR. Arnold@xxxxxxxxxx>
- Date: Sat, 3 May 2008 07:31:49 -0400
"Brent" <writebrent@xxxxxxxxx> wrote in message news:412800b8-0dde-4c8c-ad53-66c257d02021@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I have a list of company names (say, IBM, Corning, General Motors, and
another 5,000 of them).
If I take a body of text, a news article, for instance, and I want to
see which company names appear in that text, is there an efficient way
to do this?
I thought about looping through the array of names, and doing an
IndexOf or Regex match, but this method is slow. Then I thought about
an array intersection, but this is problematic for two-word company
names (you can't just create the second array based on a split on
spaces).
Any hints would be much appreciated!
Finding the phrase
How do we use the MakePattern method to find our phrase? Let's suppose that we aren't interested in where the phrase occurs, or whether it occurs several times, but just whether or not it appears at all. So our approach will be to take the original phrase, turn it into a pattern, match the pattern, and return true if the pattern has been matched:
public Boolean PhraseFound(String argPhrase, String argText)
{
String strPattern = MakePattern(argPhrase);
Match match = Regex.Match(argText, strPattern);
return match.Success;
}
I used the Regex.Match to find the occurrence of a word in a text field or variable. You can also use the features of Regex that will find the positions of the words so that can use something like a RichTextBox and position to the word or words in the textbox and highlight all the words.
.
- References:
- Prev by Date: WebServices based file transfer - not efficient?
- Next by Date: Re: Merge RTF files
- Previous by thread: Re: Checking if a list of names appears in a body of text.
- Next by thread: Re: Checking if a list of names appears in a body of text.
- Index(es):
Relevant Pages
|