Re: Zero Ranking
- From: "Dave Poole" <dp00le@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Tue, 3 Jan 2006 15:50:33 -0800
Your suspicion is correct. When you have such a small amount of data the
ranking algorithms do some strange things. The main reason is because the
terms "red" and "juicy" occur in more than half of your documents. This
means that they are treated almost as a noise word by the ranking algorithm
and are considered as very unimportant for ranking purposes.
If you were to add a few documents with unrelated content you should start
seeing some non-zero ranks since the terms would become more selective and
as such a little more relevant to the rank.
Hope this helps
--
Dave Poole
SQL Server Fulltext Team
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm.
"Steve W" <BunnyBoy123@xxxxxxxxxxxxxxxx> wrote in message
news:eF7ZMotBGHA.272@xxxxxxxxxxxxxxxxxxxxxxx
> Hi,
>
> I am using SQL Server 2000 to store a number of small Word documents in an
> image field that is being FT indexed (table name 'docs', image field name
> 'document', also have a column 'title' containing the name of the
> document).
>
> The docs look like this :
>
> Doc1 : The red apple is juicy
>
> Doc2 : The red apple is juciy
>
> Doc3 : The red apple is jiucy
>
> Doc4 : The red apple is very juicy
>
> Doc5 : The red apple is juicy. So is the yellow banana.
>
> Doc6 : The bus is red.
>
> Doc7 : The car is yellow.
>
> Doc8 : The van is blue and yellow.
>
>
>
> Note that the mis-spelling of the word 'juicy' in docs 2 and 3 is
> deliberate.
>
>
>
>
>
> I am running the following query :
>
>
>
> select fti.rank, Title
>
> from docs inner join freetexttable(Docs, document, 'red juicy') as fti
>
> on docs.Docid=fti.[key]
>
>
>
> which returns :
>
>
>
> rank Title
> ------ ----------
> 0 doc1.doc
> 0 doc2.doc
> 0 doc3.doc
> 0 doc4.doc
> 0 doc5.doc
> 0 doc6.doc
>
>
>
> Can anyone explain (a) what a rank of 0 means [I assume it just means that
> all the docs have the same rank], and (b) why docs 1, 4 and 5 that
> contain
> both 'red' and 'juicy' are not ranked higher than docs 2, 3 and 6 that
> don't
> contain the word 'juicy'.
>
>
>
> Also, am I being unreasonable in using very small amounts of text ?
>
>
>
> Thanks in advance,
>
>
>
> Steve
>
>
.
- Prev by Date: Re: Where can I find a list of word breakers?
- Next by Date: Re: mssearch.exe CPU utilization increased with SP4
- Previous by thread: Re: Where can I find a list of word breakers?
- Next by thread: failed with unknown result
- Index(es):
Loading