Re: relevance sorting with multiple search terms?

Tech-Archive recommends: Fix windows errors by optimizing your registry





--

"Hilary Cotter" <hilary.cotter@xxxxxxxxx> wrote in message
news:%2306K4BCZGHA.4916@xxxxxxxxxxxxxxxxxxxxxxx
If you know in advance all the possibilities of the search terms, I would
use the expansion type of the thesaurus option.


Hi Hilary: I'm not sure I understand what you mean by "all the
possibilities"...This is a user-controlled search tool...they're just typing
in stuff they're looking for.

I can compile a sort-of "actionable words" list for words that should be
"demoted" in relevance, like:

bulk
candy
bar

....but I don't know if that's what you mean?

Or do you mean creating my own custom dictionary of search "keywords" with
weightings?

Or are you talking about something other than weighting altogether?


--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



"msft-sql" <aklist@xxxxxxxxxxxxx> wrote in message
news:%23VUics$YGHA.1196@xxxxxxxxxxxxxxxxxxxxxxx
Hi: I'm trying to get a handle on the best way to approach this issue.

I have a product database of candy with perhaps 5000 products, and I'm
indexing the product name and description fields (both varchars).

People will search for "easter candy" for example.

Splitting the string and using a contains query on both terms will
usually produce too broad a result, returning every product with "candy"
in the name. A freetext query is also too broad.

Searching with an "AND" is too limiting, since I want to return
"chocolate easter egg" even if "candy" is nowhere in the name or
description.

A proximity search can be too limiting as well, because the words could
be completely separate in the description, e.g. "These chocolate easter
eggs are the perfect type of candy for ..."

I've tried a weighted search, like:
select * from products

where contains (name, 'isabout(easter Weight(1.0), candy Weight(0.0))')

...but that produces the same results even if I reverse the weighting for
the two terms. Maybe I'm not writing the query correctly?

I'm wondering if other people have dealt with this before? I could add
words like "candy", "bar", "bulk", "package", etc. to the noise list, but
I don't want to exclude them all the time; e.g., if someone searches for
"bulk chocolate" I don't want to drop the word "bulk" and then return
every instance of "chocolate". Similarly I don't want to return hits on
"bulk lemon drops" when I'm searching for "bulk chocolate"

It seems like weighting is the way to go, and I can even maintain an
array of low-weight words and dynamically assign the weight to them when
building the query, but it doesn't seem to work properly.

Any suggestions would be appreciated!






.



Relevant Pages

  • Re: relevance sorting with multiple search terms?
    ... use the expansion type of the thesaurus option. ... I have a product database of candy with perhaps 5000 products, ... A freetext query is also too broad. ... It seems like weighting is the way to go, and I can even maintain an array ...
    (microsoft.public.sqlserver.fulltext)
  • RE: Sharepoint 2003 Search - SPS and WSS
    ... if a user has to understand the okapi algorithm to tailor his ... > possible to influence probabilistic ranking using weighting. ... > In a query, documents get a rank value - rank values are relative to the ...
    (microsoft.public.sharepoint.portalserver)
  • RE: Applying a weighting to results.
    ... I do not see any fields in your query that would contain data that weighting ... separate queries to pull a result above the rating value and supply a Name ...
    (microsoft.public.access.queries)
  • relevance sorting with multiple search terms?
    ... I have a product database of candy with perhaps 5000 products, ... Searching with an "AND" is too limiting, since I want to return "chocolate ... want to exclude them all the time; e.g., if someone searches for "bulk ... It seems like weighting is the way to go, and I can even maintain an array ...
    (microsoft.public.sqlserver.fulltext)
  • Applying a weighting to results.
    ... I would like a query based ... separate queries to pull a result above the rating value and supply a Name ... multiplying the rating by the weighting. ...
    (microsoft.public.access.queries)