Re: Search methods, "fuzzy" logic, duplicate checking, oh my!

Tech-Archive recommends: Fix windows errors by optimizing your registry

From: John Kane (jt-kane_at_comcast.net)
Date: 03/02/04


Date: Mon, 1 Mar 2004 17:38:08 -0800

William,
Depending upon the SQL Server version (7.0 or 2000) as well as the
OS-platform (Win2K or Win2003) that you are using -- SELECT @@version --
Full Text Search (FTS) can be used to satisfy most of your search
requirements. Additionally, SQL FTS queries (CONTAINS and FREETEXT) can be
combined with SOUNDEX &/or DIFFERENCE, if necessary. If you're truly looking
for a "smart search", there are other methods using pure T-SQL that can get
you to the next level of search as I'm writing a book on this subject.

Note, you can also post FTS related questions to the newsgroup:
microsoft.public.sqlserver.fulltext (cc'ed)
Regards,
John

"William Morris" <news.remove.this.and.the.dots@seamlyne.com> wrote in
message news:c1iidi$1ipqfp$1@ID-205671.news.uni-berlin.de...
> Apologies in advance for the cross-post. Not sure if this is better
handled
> in ASP code or TransactSQL.
>
> Windows2000 Server
> SQL 2000
>
> tblPeople
> contactid int
> firstname varchar(25)
> lastname varchar(25)
> address varchar(255)
> city varchar(100)
> state int
> zip varchar(10)
> homephone varchar(25)
> workphone varchar(25)
> mobilephone varchar(25)
>
> Our contact search screen allows the users to search for a contact by
phone,
> lastname, firstname, city, state, zip. If no matches are found, then the
> user is given the option of adding the information into the database as a
> new contact. When searching, all whitespace and punctuation is removed
for
> the comparison.
>
> The database is populated two ways:
>
> 1. by hand by our users, where the format is controlled through the
> interface - i.e. phone numbers always have an area code, and the numbers
are
> stored with dashes. Numbers, dashes, spaces, and parentheses are the only
> allow characters.
> 2. Client data as CSV files from a variety of proprietary sources, and
> the data could have any ol' crap in it. We scrub as best we can, but some
> garbage gets through.
>
> So, what we find is when doing the phone number search, someone with
> 816-555-1234 won't be found if the user enters 555-1234, and vice versa.
> Likewise, if any of the names are mis-spelled, the search will fail.
>
> We'd like to make the query "smart". We've explored SOUNDEX, but the
> results that come back often confuse the end-user: "Mercer" comes back
with
> "Mercer", "Mercier" (not too bad, yet), "Marker" (ok, but...), "Markwardt"
> (um...), "Margarucci" (now we're getting blank stares...), and against
> numeric values (like phone) it just dies. DIFFERENCE might get us there
> somehow, though it returns a 4 on every variation in the preceeding list
so
> we can't sort that way. I've seen some searches where they're sorted by
> "relevance", but I'm not sure how to make that happen.
>
> So I'm looking for ideas on making a "smart search". Thanks just for
> reading this far, and thanks for any help you can give.
>
> - Wm
>
>
> --
> William Morris
> Product Development, Seritas LLC
> Kansas City, Missouri
>
>



Relevant Pages

  • Re: Search methods, "fuzzy" logic, duplicate checking, oh my!
    ... Depending upon the SQL Server version as well as the ... Full Text Search (FTS) can be used to satisfy most of your search ... > lastname, firstname, city, state, zip. ...
    (microsoft.public.sqlserver.programming)
  • Re: Search methods, "fuzzy" logic, duplicate checking, oh my!
    ... Depending upon the SQL Server version as well as the ... Full Text Search (FTS) can be used to satisfy most of your search ... > lastname, firstname, city, state, zip. ...
    (microsoft.public.sqlserver.fulltext)
  • Re: FTS 2005 vs index server
    ... SQL FTS 2005 is more scalable than Indexing Services. ... Looking for a SQL Server replication book? ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Basics, just the basics please
    ... so for now there is no "SQL Full-Text Search for Dummy's" ... SQL Server 2000 Books online using the search tab and search on "full ... "Full-text Querying SQL Server Data". ... many SQL FTS issues are ...
    (microsoft.public.sqlserver.fulltext)
  • Re: FTS Performance in SQL 2005
    ... we had no end of problems with SQL FTS. ... Looking for a SQL Server replication book? ... The Memory Usage and VM Size never increase over about 65 MB and 20MB. ... cost relative to the whole batch, ...
    (microsoft.public.sqlserver.fulltext)