RE: How EXACTLY does Indexing Service determine rank

Tech-Archive recommends: Fix windows errors by optimizing your registry

From: George Cheng [MSFT] (GCheng_at_online.microsoft.com)
Date: 06/10/04

  • Next message: George Cheng [MSFT]: "Re: HTMLProp IFilter"
    Date: Thu, 10 Jun 2004 17:22:23 GMT
    
    

    1. The number of times a word appears in a document divided by the total
    number of
    words in the document. This is further weighted by having hits in areas
    like
    headers or titles that weigh more than the body of the document.

    2. The closer the searched for words are to each other, the higher the
    rank, until
    the point that they are adjacent becoming a phrase and raising the rank
    even
    higher.

     3. The ranking mechanism is weighted so that the more highly inflected
    the word
    is from the version asked for originally, the lower its rank in the result
    set. For
    example, "swim" would be closer to "swims" and further from "swimmer"
    because
    "swim" and "swimmer" are less related grammatically. In other words, the
    plural
    noun form is more related grammatically than the past-tense verb form of
    the same
    word. When resolving queries, the linguistic engine and ranking algorithm
    take
    these linguistic features into account.

    Index server doesn't treat ranking as "x words per document" but rather on
    word
    density. Such as a document with 200 words vs a doc with 20,000 words, each
    containing one instance of the word searched for. The one with 200 words
    will have
    a density of 1/200 which is higher than the one with 1/20,000. So a small
    document
    with one hit can outweigh a larger document with more hits. Your result set
    may
    contain all of the same results, but the ranking values may never be
    consistent
    because of the "arbitrary algorithm" used to calculate it.

    There is no way to change the ranking mechanism. There books on the common
    algorithms used in the field of Indexing but there are no whitepapers. The
    Indexing Service is based on ranking formulas that are used everywhere from
    statistics to molecular biology. These are not listed in any articles or
    white
    papers because it is subject to change in future versions based on user
    feedback
    and performance tweaking.

    Thank You

    George Cheng

    Microsoft Application Center & Index Server Support

    Note: This article has no warranties implicit or explicit.
    All the content is given on the "as is" basis and the user
    takes full responsibility for its use and assumption.
    Microsoft Corporation Copyright 2004
    All Rights Reserved

    --------------------
    | From: Ron Forte (rforte@bloomberg.net)
    | Subject: How EXACTLY does Indexing Service determine rank
    | Message-ID: <e4wsOwvTEHA.2716@tk2msftngp13.phx.gbl>
    | Newsgroups: microsoft.public.inetserver.indexserver
    | Date: Thu, 10 Jun 2004 08:03:14 -0700
    | NNTP-Posting-Host: shared2.orcsweb.com 66.129.69.1
    | Lines: 1
    | Path:
    cpmsftngxa10.phx.gbl!TK2MSFTFEED01.phx.gbl!TK2MSFTNGP08.phx.gbl!tk2msftngp13
    .phx.gbl
    | Xref: cpmsftngxa10.phx.gbl microsoft.public.inetserver.indexserver:29097
    | X-Tomcat-NG: microsoft.public.inetserver.indexserver
    |
    | So, Googling for the past 3 hours has gotten me ZERO information on how
    Microsoft Indexing Service determines the numeric value it assigns to Rank.
     The problem we are having is that we've got an asp search page that
    queries the indexing catalog on a particular directory on one of our web
    servers. It's sorted by rank[d], and there are many instances where higher
    results contain fewer instances of the search terms in the file's title,
    metadeta, and content than much lower results.
    |
    | What we are trying to determine is the algorithm that Indexing Service
    uses to rank certain files higher than others. For example, if the word
    "fish" appears once in the title for one document, does that get a higher
    ranking than a file that contains "fish" twice in the content, but not at
    all in the title? Or, would one appearance of the word near the top of a
    documents contents allow it to rank higher than if it appeared twice near
    the very end of a document?
    |
    | Where can I get this kind of information? Books? Online tutorials?
    Because msdn.microsoft.com certainly has absolutely no information of the
    sort whatsoever anywhere in the Knowledge base or elsewhere.
    |
    | This one is killing me. Any help will be most highly and eternally
    appreciated.
    |
    | **********************************************************************
    | Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
    | Comprehensive, categorised, searchable collection of links to ASP &
    ASP.NET resources...
    |


  • Next message: George Cheng [MSFT]: "Re: HTMLProp IFilter"

    Relevant Pages

    • Questions about ranking theory
      ... I hav a question about the ranking theory (the thing Google ... V:=eigenvector of A corresponding to largest real eigenvalue ... V contains some measure of the rank. ... algorithm that would consider the verticies appearing sooner ...
      (sci.math)
    • Re: Fulltext query with custom rank
      ... return the top 10 results as calculated by your algorithm? ... I've created my own CLR function and use it instead of FTS rank. ... predominant factor for the ranking. ... still hoping that someone has a clue on how to possibly manipulate FT ...
      (microsoft.public.sqlserver.fulltext)
    • Re: Increasing your PAGE RANK
      ... >> Participating in link schemes is a way to increase page rank, ... so the site 'spam ranking' will ... I have a "useful links" page on one site for people who want ... > to do a link exchange. ...
      (alt.internet.search-engines)
    • Re: Integrating results from different pages? (No luck w/ INDEX &
      ... ranking for every player that has a ranking. ... Most of my source come in the format "Rank. ... Lastname" match a cell that is a formula that gives the same result? ... > Now you have all the names in sheet3 ...
      (microsoft.public.excel.misc)
    • RE: Ranking different groups in one column
      ... Thank you for the ranking different groups within one colmun formula: ... As you see, three regions, a bunch of store names and scores. ... Gary''s Student - gsnu200776 ... a.n.other user from this datasheet and just 'pulling' the rank no into the ...
      (microsoft.public.excel.misc)