Re: Search methods, "fuzzy" logic, duplicate checking, oh my!
From: John Kane (jt-kane_at_comcast.net)
Date: 03/02/04
- Next message: news.microsoft.com: "Re: Run script on another NON-IIS server"
- Previous message: Bob Barrows: "Re: Bugs in VBScript GetRef()"
- Messages sorted by: [ date ] [ thread ]
Date: Mon, 1 Mar 2004 17:38:08 -0800
William,
Depending upon the SQL Server version (7.0 or 2000) as well as the
OS-platform (Win2K or Win2003) that you are using -- SELECT @@version --
Full Text Search (FTS) can be used to satisfy most of your search
requirements. Additionally, SQL FTS queries (CONTAINS and FREETEXT) can be
combined with SOUNDEX &/or DIFFERENCE, if necessary. If you're truly looking
for a "smart search", there are other methods using pure T-SQL that can get
you to the next level of search as I'm writing a book on this subject.
Note, you can also post FTS related questions to the newsgroup:
microsoft.public.sqlserver.fulltext (cc'ed)
Regards,
John
"William Morris" <news.remove.this.and.the.dots@seamlyne.com> wrote in
message news:c1iidi$1ipqfp$1@ID-205671.news.uni-berlin.de...
> Apologies in advance for the cross-post. Not sure if this is better
handled
> in ASP code or TransactSQL.
>
> Windows2000 Server
> SQL 2000
>
> tblPeople
> contactid int
> firstname varchar(25)
> lastname varchar(25)
> address varchar(255)
> city varchar(100)
> state int
> zip varchar(10)
> homephone varchar(25)
> workphone varchar(25)
> mobilephone varchar(25)
>
> Our contact search screen allows the users to search for a contact by
phone,
> lastname, firstname, city, state, zip. If no matches are found, then the
> user is given the option of adding the information into the database as a
> new contact. When searching, all whitespace and punctuation is removed
for
> the comparison.
>
> The database is populated two ways:
>
> 1. by hand by our users, where the format is controlled through the
> interface - i.e. phone numbers always have an area code, and the numbers
are
> stored with dashes. Numbers, dashes, spaces, and parentheses are the only
> allow characters.
> 2. Client data as CSV files from a variety of proprietary sources, and
> the data could have any ol' crap in it. We scrub as best we can, but some
> garbage gets through.
>
> So, what we find is when doing the phone number search, someone with
> 816-555-1234 won't be found if the user enters 555-1234, and vice versa.
> Likewise, if any of the names are mis-spelled, the search will fail.
>
> We'd like to make the query "smart". We've explored SOUNDEX, but the
> results that come back often confuse the end-user: "Mercer" comes back
with
> "Mercer", "Mercier" (not too bad, yet), "Marker" (ok, but...), "Markwardt"
> (um...), "Margarucci" (now we're getting blank stares...), and against
> numeric values (like phone) it just dies. DIFFERENCE might get us there
> somehow, though it returns a 4 on every variation in the preceeding list
so
> we can't sort that way. I've seen some searches where they're sorted by
> "relevance", but I'm not sure how to make that happen.
>
> So I'm looking for ideas on making a "smart search". Thanks just for
> reading this far, and thanks for any help you can give.
>
> - Wm
>
>
> --
> William Morris
> Product Development, Seritas LLC
> Kansas City, Missouri
>
>
- Next message: news.microsoft.com: "Re: Run script on another NON-IIS server"
- Previous message: Bob Barrows: "Re: Bugs in VBScript GetRef()"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|