RE: search phrase



Yair,
What you might want to try is using the SQL Server 2000 XML thersoursa files
to include the company variations you are looking for. Note, that while this
option is un-supported by Microsoft, I have directly confirmed with the SQL
Dev team that this is functional using only FREETEXT or FREETEXTTABLE under
SP3 as well.

See KB article 309678 "FIX: Thesaurus Support is Not Available for Full-Text
Search" at
http://support.microsoft.com/default.aspx?scid=kb;en-us;309678&Product=sql2k
for details. Below are the steps necessary to enable this functionality.

use pubs
go

CREATE TABLE test_07062001 (
PK int NOT NULL IDENTITY (1, 1),
col char(50),
PRIMARY KEY (PK)
)

insert test_07062001 (col) values('KW')
insert test_07062001 (col) values('Karen Delaney')
insert test_07062001 (col) values('Kalen Delaney')
insert test_07062001 (col) values('Karen Waterson')
insert test_07062001 (col) values('Karen Watterson')
insert test_07062001 (col) values('KWatterson')
insert test_07062001 (col) values('KarenW')
insert test_07062001 (col) values('Sunset Boulevard')
insert test_07062001 (col) values('Sunset Blvd')
insert test_07062001 (col) values('Blvd of Broken Dreams')
insert test_07062001 (col) values('Boulevard of Broken Dreams')
insert test_07062001 (col) values('Stephen Dybing')
insert test_07062001 (col) values('Steven Dybing')

then build your FTS Index
sp_fulltext_database 'enable'
sp_fulltext_catalog 'Karen', 'create'
sp_fulltext_table test_07062001, 'create', 'Karen', 'PK'
sp_fulltext_column test_07062001, 'col', 'add', 1033
sp_fulltext_table test_07062001, 'activate'

After the Full Population is complete, run the following SQL FREETEXT queries

select * from test_07062001 where freetext(col,'karen') -- 3 hits
select * from test_07062001 where freetext(col,'steven') -- 1 hit
select * from test_07062001 where freetext(col,'boulevard') -- 2 hits

Navigate to c:\Program Files\Microsoft SQL
Server\MSSQL\FTDATA\SQLServer\Config find a file named tseng.xml (note that
this is for UK English and you need US English)
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ContentIndexCommon\LanguageResources\Override\SQLServer\English
(United States) and changing TsaurusFile to read tsenu.xml - be sure to
include the full path if you do).

Modify your tseng.xml file and save it.

<XML ID="Tahoe Thesaurus">
<thesaurus xmlns="x-schema:tsSchema.xml">
<replacement>
<pat>steven</pat>
<sub weight="1.0">Stephen</sub>
</replacement>
<replacement>
<pat>blvd</pat>
<sub weight="1.0">Boulevard</sub>
</replacement>
<replacement>
<pat>KW</pat>
<pat>KarenW</pat>
<pat>KWatterson</pat>
<pat>Karen Waterson</pat>
<sub weight="1.0">Karen Watterson</sub>
</replacement>
<replacement>
<pat>colour</pat>
<sub weight="1.0">color</sub>
</replacement>
<expansion>
<sub weight="0.5">run**</sub>
<sub weight="0.5">jog**</sub>
</expansion>
</thesaurus>
</XML>
*/

Now, re-run the following SQL FREETEXT queries & compare the results from
above:

select * from test_07062001 where freetext(col,'karen')
select * from test_07062001 where freetext(col,'steven')
select * from test_07062001 where freetext(col,'boulevard')

Hope that helps!
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/



"Yair Nissan" wrote:
> Hi,
>
> We are using SQL 2000 sp3. We have a Customers table in which the customer's
> name is FTS indexed.. Since the insertion into the DB is made manually the
> customers' names are often misspelled or written with small modifications
> from the already existing names in our DB. The table is using
> SQL_Latin1_General_CP1_CI_AS collation.
>
> We've removed all the content of the noise files since our customers are
> from all over the world and we need to remove a different set of noise words
> for each country.
>
> I've been testing what the best search phrase for the customer name might
> be, and currently we're using the following algorithm :
>
> 1. Remove all non alphanumeric characters from the customer name.
> 2. Remove the country oriented "noise" words - such as Inc, LTD for US
> customers , Gmbh of German customers etc .
> 3. Split the customer name using the white space as the delimiter.
> 4. Start querying the DB :
> a. A FreeTextTable query with the full customer name (after stages
> 1-3). If one of the results (after performing stages 1-3 on each of them) is
> the exact match it means we found the customer in the DB and we take the
> appropriate actions.
> b. If not, we remove the last word and go on until less than ((number
> of words in the original customer name) / 2) words are left. If we get to
> that point it means that there is no record for the customer and we add a
> new one.
>
> This query gives us the best results (performance is a different issue...)
> so far, however very often this algorithm doesn't work, especially when the
> customer name was entered with an extra white space ("Sun Microsystems" and
> "Sun Micro Systems") or using the plural version of the word ("Sun
> Microsystem" AND "Sun Microsystems").
>
> I've been looking all over the net for good articles about that subject but
> found very few. I would appreciate any help you can provide.
>
> Thanks,
> Yair

.



Relevant Pages

  • Re: OOP - a question about database access
    ... >>and project so much better in SQL DBMSes than in ODBMSes, ... >>100x more bytes from the database, just because you want your objects to ... > a related invoice.. ... > assoication from the customer to the invoice collection and have done ...
    (comp.object)
  • Re: object databases
    ... Now, tell me, you can do all that with 2 lines of code with SQL ... didn't provide the Customer, Order and OrderDetail classes. ... If you try to map classes to tables, ...
    (comp.object)
  • Re: sql problem
    ... Account # is the link field on ... The requirement is I have to come up with a report comparing the two ... customer tables to see if they are in sync. ... what I wanted to do was to get the SQL to return the "like" names ...
    (comp.databases.oracle.server)
  • Re: search phrase
    ... SQL Full Text Search Blog ... > Dev team that this is functional using only FREETEXT or FREETEXTTABLE ... Below are the steps necessary to enable this functionality. ... Remove all non alphanumeric characters from the customer name. ...
    (microsoft.public.sqlserver.fulltext)
  • Re: OOP - a question about database access
    ... >> using a tool that generates it and hand coding little or no SQL? ... > between objects and the controller, plus the mapping files, plus the HQL ... imagine I have a number of screens that display customer name ... I select a customer to edit and change the name. ...
    (comp.object)

Quantcast