RE: How EXACTLY does Indexing Service determine rank
From: George Cheng [MSFT] (GCheng_at_online.microsoft.com)
Date: 06/10/04
- Previous message: Denis: "Indexing services and quotes"
- In reply to: rforte_at_bloomberg.net: "How EXACTLY does Indexing Service determine rank"
- Next in thread: cawoodm_at_yahoo.com: "RE: How EXACTLY does Indexing Service determine rank"
- Reply: cawoodm_at_yahoo.com: "RE: How EXACTLY does Indexing Service determine rank"
- Messages sorted by: [ date ] [ thread ]
Date: Thu, 10 Jun 2004 17:22:23 GMT
1. The number of times a word appears in a document divided by the total
number of
words in the document. This is further weighted by having hits in areas
like
headers or titles that weigh more than the body of the document.
2. The closer the searched for words are to each other, the higher the
rank, until
the point that they are adjacent becoming a phrase and raising the rank
even
higher.
3. The ranking mechanism is weighted so that the more highly inflected
the word
is from the version asked for originally, the lower its rank in the result
set. For
example, "swim" would be closer to "swims" and further from "swimmer"
because
"swim" and "swimmer" are less related grammatically. In other words, the
plural
noun form is more related grammatically than the past-tense verb form of
the same
word. When resolving queries, the linguistic engine and ranking algorithm
take
these linguistic features into account.
Index server doesn't treat ranking as "x words per document" but rather on
word
density. Such as a document with 200 words vs a doc with 20,000 words, each
containing one instance of the word searched for. The one with 200 words
will have
a density of 1/200 which is higher than the one with 1/20,000. So a small
document
with one hit can outweigh a larger document with more hits. Your result set
may
contain all of the same results, but the ranking values may never be
consistent
because of the "arbitrary algorithm" used to calculate it.
There is no way to change the ranking mechanism. There books on the common
algorithms used in the field of Indexing but there are no whitepapers. The
Indexing Service is based on ranking formulas that are used everywhere from
statistics to molecular biology. These are not listed in any articles or
white
papers because it is subject to change in future versions based on user
feedback
and performance tweaking.
Thank You
George Cheng
Microsoft Application Center & Index Server Support
Note: This article has no warranties implicit or explicit.
All the content is given on the "as is" basis and the user
takes full responsibility for its use and assumption.
Microsoft Corporation Copyright 2004
All Rights Reserved
--------------------
| From: Ron Forte (rforte@bloomberg.net)
| Subject: How EXACTLY does Indexing Service determine rank
| Message-ID: <e4wsOwvTEHA.2716@tk2msftngp13.phx.gbl>
| Newsgroups: microsoft.public.inetserver.indexserver
| Date: Thu, 10 Jun 2004 08:03:14 -0700
| NNTP-Posting-Host: shared2.orcsweb.com 66.129.69.1
| Lines: 1
| Path:
cpmsftngxa10.phx.gbl!TK2MSFTFEED01.phx.gbl!TK2MSFTNGP08.phx.gbl!tk2msftngp13
.phx.gbl
| Xref: cpmsftngxa10.phx.gbl microsoft.public.inetserver.indexserver:29097
| X-Tomcat-NG: microsoft.public.inetserver.indexserver
|
| So, Googling for the past 3 hours has gotten me ZERO information on how
Microsoft Indexing Service determines the numeric value it assigns to Rank.
The problem we are having is that we've got an asp search page that
queries the indexing catalog on a particular directory on one of our web
servers. It's sorted by rank[d], and there are many instances where higher
results contain fewer instances of the search terms in the file's title,
metadeta, and content than much lower results.
|
| What we are trying to determine is the algorithm that Indexing Service
uses to rank certain files higher than others. For example, if the word
"fish" appears once in the title for one document, does that get a higher
ranking than a file that contains "fish" twice in the content, but not at
all in the title? Or, would one appearance of the word near the top of a
documents contents allow it to rank higher than if it appeared twice near
the very end of a document?
|
| Where can I get this kind of information? Books? Online tutorials?
Because msdn.microsoft.com certainly has absolutely no information of the
sort whatsoever anywhere in the Knowledge base or elsewhere.
|
| This one is killing me. Any help will be most highly and eternally
appreciated.
|
| **********************************************************************
| Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
| Comprehensive, categorised, searchable collection of links to ASP &
ASP.NET resources...
|
- Previous message: Denis: "Indexing services and quotes"
- In reply to: rforte_at_bloomberg.net: "How EXACTLY does Indexing Service determine rank"
- Next in thread: cawoodm_at_yahoo.com: "RE: How EXACTLY does Indexing Service determine rank"
- Reply: cawoodm_at_yahoo.com: "RE: How EXACTLY does Indexing Service determine rank"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|