Re: Where can I find a list of word breakers?
- From: "Dave Poole" <dp00le@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 4 Jan 2006 17:24:32 -0800
transform noisewords is a new feature in SQL 2005 but it only affects
boolean type operations.
SQL 2000 supports phrase queries though and should work fine when one or
more of the words is noise
A real phrase query is something like this:
select * from table where contains(column,'"hello world"')
(note the double quotes)
the query below (without the double quotes) would be invalid and give a
syntax error.
select * from table where contains(column,'hello world')
The foo@xxxxxxx type query is more like a compound word query. Since it
doesn't have any whitespace you can use it with or without double quotes but
it is generally a good idea to use double quotes anyway.
ie both the following are valid
select * from table where contains(column,'foo@xxxxxxx')
select * from table where contains(column,'"foo@xxxxxxx"')
Glad to be of help.
--
Dave Poole
SQL Server Fulltext Team
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm.
"Jim Sneeringer" <JimSneeringer@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:3BB85973-3D42-47FE-8EB0-075DD357A0D6@xxxxxxxxxxxxxxxx
> Thanks. I'm using 2005, and I'll do as you say. I didn't know about
> "transform noise words" or that SQL could do phrase searching.
>
> It is possible, though, that I'll be forced to deploy on SQL 2000, because
> of limitations of the hosting service. From what you say, I gather that
> "transform noise words" is not available in 2000. Is that right? What
> about
> phrase searches?
>
> Thaks again. This is a big help.
>
> "Dave Poole" wrote:
>
>> The functionality you mention below exists in SQL 2005, which version are
>> you using?
>>
>> If you are using 2005 you need to set the sp_configure parameter
>> 'transform
>> noise words' to 1 ie
>>
>> exec sp_configure 'transtransform noise words', 1
>> reconfigure
>>
>> this will cause noise words to be effectively removed from queries with
>> boolean operators, in the case of AND queries, noise becomes true, with
>> OR
>> queries it becomes false.
>>
>> However, you should be able to query for the raw term 'j@xxxxxxx' rather
>> than breaking it up into terms and adding ANDs. Using ands will find
>> more
>> documents than it should since the terms could show up elsewhere in the
>> document. Searching for j@xxxxxxx will only hit documents which have foo
>> and com in adjacent positions. Also, if you had a real word like
>> dave@xxxxxxx the search would match "dave foo com" in adjacent positions
>> which is even better.
>>
>> --
>> Dave Poole
>> SQL Server Fulltext Team
>>
>> This posting is provided "AS IS" with no warranties, and confers no
>> rights.
>> Use of included script samples are subject to the terms specified at
>> http://www.microsoft.com/info/cpyright.htm.
>>
>> "Jim Sneeringer" <JimSneeringer@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
>> message
>> news:D7F0601D-D4E1-4F18-ABC4-7F145E7FC1C2@xxxxxxxxxxxxxxxx
>> > Thanks for your answer.
>> >
>> > It seems to be just as you say. Both the query and the database text
>> > contain j@xxxxxxxx My parser, in order to match what SQL does, changes
>> > j@xxxxxxx into the "and" of the three words: j, foo and com. It works
>> > fine
>> > unless one of the words is a single letter. It looks to me as if SQL
>> > ignores
>> > one-letter words in the database, but does not ignore them in the
>> > query,
>> > so
>> > this situation results in a non-hit where the user expects to get a
>> > hit. I
>> > would expect SQL to apply the same rules both places. If a user
>> > searches
>> > for
>> > a noise word (either one in the list or a single-letter word) it should
>> > be
>> > dropped from the query rather than causing a hit to be overlooked.
>> >
>> > "Dave Poole" wrote:
>> >
>> >> Single character words are treated as noise by default so something
>> >> like
>> >> j@xxxxxxx would first be broken into "j foo com" (since @ and . are
>> >> both
>> >> breaking characters), of these only foo and com would be indexed.
>> >>
>> >> So only foo com was indexed but when you query for j@xxxxxxx you
>> >> should
>> >> still find a match since query and index time noiseword behaviour
>> >> should
>> >> be
>> >> the same.
>> >>
>> >> Can you reproduce it? If so I can take a look and try to explain it.
>> >>
>> >>
>> >> --
>> >> Dave Poole
>> >> SQL Server Fulltext Team
>> >>
>> >> This posting is provided "AS IS" with no warranties, and confers no
>> >> rights.
>> >> Use of included script samples are subject to the terms specified at
>> >> http://www.microsoft.com/info/cpyright.htm.
>> >>
>> >>
>> >>
>> >> "Jim Sneeringer" <JimSneeringer@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
>> >> message
>> >> news:6C685B68-03E0-46C6-B244-99533E2F9DC8@xxxxxxxxxxxxxxxx
>> >> > After further testing, I think @ is a word breaker. Apparently
>> >> > something
>> >> > else
>> >> > in my first test caused me to think it wasn't.
>> >> >
>> >> > Maybe it has to do with noise words. The search that filed was for
>> >> > an
>> >> > address with only one letter (the letter "j") in front of the @.
>> >> > When I
>> >> > put
>> >> > some other e-mail addresses in and searched for them, it found them
>> >> > just
>> >> > fine.
>> >> >
>> >> > Could it be that noise words must be removed from the query, or the
>> >> > result
>> >> > is not found. In other words, did SQL possibly ignore teh "j" in
>> >> > the
>> >> > database, but not ignore the "j" in the query?
>> >> >
>> >> > Or maybe there is some other anomoly that caused that e-mail address
>> >> > not
>> >> > to
>> >> > be found.
>> >>
>> >>
>> >>
>>
>>
>>
.
- References:
- Re: Where can I find a list of word breakers?
- From: Dave Poole
- Re: Where can I find a list of word breakers?
- From: Dave Poole
- Re: Where can I find a list of word breakers?
- From: Jim Sneeringer
- Re: Where can I find a list of word breakers?
- Prev by Date: Re: Where can I find a list of word breakers?
- Next by Date: failed with unknown result
- Previous by thread: Re: Where can I find a list of word breakers?
- Next by thread: Re: Zero Ranking
- Index(es):
Relevant Pages
|
Loading