Re: Where can I find a list of word breakers?
- From: "Dave Poole" <dp00le@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 4 Jan 2006 13:07:00 -0800
The functionality you mention below exists in SQL 2005, which version are
you using?
If you are using 2005 you need to set the sp_configure parameter 'transform
noise words' to 1 ie
exec sp_configure 'transtransform noise words', 1
reconfigure
this will cause noise words to be effectively removed from queries with
boolean operators, in the case of AND queries, noise becomes true, with OR
queries it becomes false.
However, you should be able to query for the raw term 'j@xxxxxxx' rather
than breaking it up into terms and adding ANDs. Using ands will find more
documents than it should since the terms could show up elsewhere in the
document. Searching for j@xxxxxxx will only hit documents which have foo
and com in adjacent positions. Also, if you had a real word like
dave@xxxxxxx the search would match "dave foo com" in adjacent positions
which is even better.
--
Dave Poole
SQL Server Fulltext Team
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm.
"Jim Sneeringer" <JimSneeringer@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:D7F0601D-D4E1-4F18-ABC4-7F145E7FC1C2@xxxxxxxxxxxxxxxx
> Thanks for your answer.
>
> It seems to be just as you say. Both the query and the database text
> contain j@xxxxxxxx My parser, in order to match what SQL does, changes
> j@xxxxxxx into the "and" of the three words: j, foo and com. It works fine
> unless one of the words is a single letter. It looks to me as if SQL
> ignores
> one-letter words in the database, but does not ignore them in the query,
> so
> this situation results in a non-hit where the user expects to get a hit. I
> would expect SQL to apply the same rules both places. If a user searches
> for
> a noise word (either one in the list or a single-letter word) it should be
> dropped from the query rather than causing a hit to be overlooked.
>
> "Dave Poole" wrote:
>
>> Single character words are treated as noise by default so something like
>> j@xxxxxxx would first be broken into "j foo com" (since @ and . are both
>> breaking characters), of these only foo and com would be indexed.
>>
>> So only foo com was indexed but when you query for j@xxxxxxx you should
>> still find a match since query and index time noiseword behaviour should
>> be
>> the same.
>>
>> Can you reproduce it? If so I can take a look and try to explain it.
>>
>>
>> --
>> Dave Poole
>> SQL Server Fulltext Team
>>
>> This posting is provided "AS IS" with no warranties, and confers no
>> rights.
>> Use of included script samples are subject to the terms specified at
>> http://www.microsoft.com/info/cpyright.htm.
>>
>>
>>
>> "Jim Sneeringer" <JimSneeringer@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
>> message
>> news:6C685B68-03E0-46C6-B244-99533E2F9DC8@xxxxxxxxxxxxxxxx
>> > After further testing, I think @ is a word breaker. Apparently
>> > something
>> > else
>> > in my first test caused me to think it wasn't.
>> >
>> > Maybe it has to do with noise words. The search that filed was for an
>> > address with only one letter (the letter "j") in front of the @. When I
>> > put
>> > some other e-mail addresses in and searched for them, it found them
>> > just
>> > fine.
>> >
>> > Could it be that noise words must be removed from the query, or the
>> > result
>> > is not found. In other words, did SQL possibly ignore teh "j" in the
>> > database, but not ignore the "j" in the query?
>> >
>> > Or maybe there is some other anomoly that caused that e-mail address
>> > not
>> > to
>> > be found.
>>
>>
>>
.
- Follow-Ups:
- Re: Where can I find a list of word breakers?
- From: Jim Sneeringer
- Re: Where can I find a list of word breakers?
- References:
- Re: Where can I find a list of word breakers?
- From: Dave Poole
- Re: Where can I find a list of word breakers?
- Prev by Date: Re: mssearch.exe CPU utilization increased with SP4
- Next by Date: Re: Where can I find a list of word breakers?
- Previous by thread: Re: Where can I find a list of word breakers?
- Next by thread: Re: Where can I find a list of word breakers?
- Index(es):
Relevant Pages
|
Loading