Re: Where can I find a list of word breakers?



The functionality you mention below exists in SQL 2005, which version are
you using?

If you are using 2005 you need to set the sp_configure parameter 'transform
noise words' to 1 ie

exec sp_configure 'transtransform noise words', 1
reconfigure

this will cause noise words to be effectively removed from queries with
boolean operators, in the case of AND queries, noise becomes true, with OR
queries it becomes false.

However, you should be able to query for the raw term 'j@xxxxxxx' rather
than breaking it up into terms and adding ANDs. Using ands will find more
documents than it should since the terms could show up elsewhere in the
document. Searching for j@xxxxxxx will only hit documents which have foo
and com in adjacent positions. Also, if you had a real word like
dave@xxxxxxx the search would match "dave foo com" in adjacent positions
which is even better.

--
Dave Poole
SQL Server Fulltext Team

This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm.

"Jim Sneeringer" <JimSneeringer@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:D7F0601D-D4E1-4F18-ABC4-7F145E7FC1C2@xxxxxxxxxxxxxxxx
> Thanks for your answer.
>
> It seems to be just as you say. Both the query and the database text
> contain j@xxxxxxxx My parser, in order to match what SQL does, changes
> j@xxxxxxx into the "and" of the three words: j, foo and com. It works fine
> unless one of the words is a single letter. It looks to me as if SQL
> ignores
> one-letter words in the database, but does not ignore them in the query,
> so
> this situation results in a non-hit where the user expects to get a hit. I
> would expect SQL to apply the same rules both places. If a user searches
> for
> a noise word (either one in the list or a single-letter word) it should be
> dropped from the query rather than causing a hit to be overlooked.
>
> "Dave Poole" wrote:
>
>> Single character words are treated as noise by default so something like
>> j@xxxxxxx would first be broken into "j foo com" (since @ and . are both
>> breaking characters), of these only foo and com would be indexed.
>>
>> So only foo com was indexed but when you query for j@xxxxxxx you should
>> still find a match since query and index time noiseword behaviour should
>> be
>> the same.
>>
>> Can you reproduce it? If so I can take a look and try to explain it.
>>
>>
>> --
>> Dave Poole
>> SQL Server Fulltext Team
>>
>> This posting is provided "AS IS" with no warranties, and confers no
>> rights.
>> Use of included script samples are subject to the terms specified at
>> http://www.microsoft.com/info/cpyright.htm.
>>
>>
>>
>> "Jim Sneeringer" <JimSneeringer@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
>> message
>> news:6C685B68-03E0-46C6-B244-99533E2F9DC8@xxxxxxxxxxxxxxxx
>> > After further testing, I think @ is a word breaker. Apparently
>> > something
>> > else
>> > in my first test caused me to think it wasn't.
>> >
>> > Maybe it has to do with noise words. The search that filed was for an
>> > e-mail
>> > address with only one letter (the letter "j") in front of the @. When I
>> > put
>> > some other e-mail addresses in and searched for them, it found them
>> > just
>> > fine.
>> >
>> > Could it be that noise words must be removed from the query, or the
>> > result
>> > is not found. In other words, did SQL possibly ignore teh "j" in the
>> > database, but not ignore the "j" in the query?
>> >
>> > Or maybe there is some other anomoly that caused that e-mail address
>> > not
>> > to
>> > be found.
>>
>>
>>


.



Relevant Pages

  • Re: Where can I find a list of word breakers?
    ... transform noisewords is a new feature in SQL 2005 but it only affects ... SQL 2000 supports phrase queries though and should work fine when one or ... A real phrase query is something like this: ... > "transform noise words" or that SQL could do phrase searching. ...
    (microsoft.public.sqlserver.fulltext)
  • Re: noise words, @@ERROR, and stop and resume indexing
    ... This can be resolved without making the UDF "infallible", ... Removing the noise words, as well as not passing empty or null strings ... SQL Full Text Search Blog ... if this query causes an ignored-word error ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Where can I find a list of word breakers?
    ... "transform noise words" or that SQL could do phrase searching. ... It is possible, though, that I'll be forced to deploy on SQL 2000, because ... Both the query and the database text ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Where can I find a list of word breakers?
    ... of these only foo and com would be indexed. ... So only foo com was indexed but when you query for j@xxxxxxx you should ... SQL Server Fulltext Team ... > Maybe it has to do with noise words. ...
    (microsoft.public.sqlserver.fulltext)
  • Re: SQL contains clause validation problem
    ... I'm wondering why you need to filter out noise words when SQL Server ... If you need to validate your input strings, ...
    (microsoft.public.dotnet.languages.vb)

Loading