Re: "Google Like" with weighted searches project




DQ dont quit wrote:
I'm currently working on a ASP.Net / C# / SQL 2000 project that involves the entering of keywords, that a web user enters, and then searching MSWord documents for those words.

Good project.


This information will then be used to perform weighted searches on the keywords and text of multiple MSWord documents. How might this best be accomplished? Should I perform Full Text Searches on the Word files or store the data in a database (by coping and pasting the document into a Web app page)?

What would be cool is to copy each word, word by word, into a database, assigning it an integer as to it's place in the document and some other information that would be useful in searching ( page number, paragraph number ) and so on.


So, you would create a Text-Search based object model that would be based on the atomic unit in the database, which consists of all the words (not many words bigger than 255 that I know of!) and a clustered index of the place in the document where the word is.

Then you would make a table of noise words, a table of similar words ( singular mapped to plural).

Ok, so then, you would do a fast search by finding all the words, then grabbing the place, and offering the list to the user. Then, you would make an interface to move the user to the place inside the document.


If I store it in a database, how would I
store more than 255 characters and then be able to do searches on specific words? Thanks in advance for your reply!
.



Relevant Pages

  • nail omits Pervez AL-Assads return
    ... When searching for derogatory references to President Richard M. Nixon ... This is done by selecting keywords that match against the routing ... The phone analogy is recipient and originating phone numbers. ...
    (rec.ponds)
  • Re: Documents suddenly become hidden
    ... there is no nefarious (virus or spy-bot) software (or people which ... > searching for an explanation. ... > I have a user that claims her word documents disappear in front of her eyes ... > Office 2003 and Windows XP and I am puzzled at why certain MSWord documents ...
    (microsoft.public.office.misc)
  • Both suming now, Pervez and Marwan focused the annual matrixs for instance passing scope.
    ... When searching for derogatory references to President Richard M. Nixon ... You may wonder what keywords excel at picking up "resume condition" traffic. ... The phone analogy is recipient and originating phone numbers. ...
    (sci.crypt)
  • Mahammed shuts the dividend out of hers and upstairs heads.
    ... When searching for derogatory references to President Richard M. Nixon ... You may wonder what keywords excel at picking up "resume condition" traffic. ...
    (sci.crypt)
  • Re: Special Left function wanted...
    ... > InStrto find successive commas, but I'm lazy and would use a regular ... >>author, wordcount, keywords, title, article, etc.) ... >>faster than searching the memo field that the article is in, ... >>20,000 characters or even more. ...
    (microsoft.public.access.queries)