Re: C# is ignored on 2000 Server in mssearch

From: Richard Yeo (RichardYeo_at_discussions.microsoft.com)
Date: 09/13/04


Date: Mon, 13 Sep 2004 10:45:07 -0700

John

Thanks for taking the time to reply to my post.

I came across another problem which is documented here along with my
solution to both of these problems.

http://msdn.microsoft.com/newsgroups/default.aspx?dg=microsoft.public.sqlserver.fulltext&mid=370c3f83-994a-4ec6-b8c6-e329378ea002&sloc=en-us

To summarise I have decided to avoid the pitfalls of the various Indexing
Service implementations and convert certain words and symbols, e.g. C# ->
CSHARP, etc. which I think you might have suggested in another post.

Thanks

Rich

"John Kane" wrote:

> Richard,
> Yes, I can. A specific fix for these single letter (C, J) and punctuation
> programming languages was based upon Upper-case letters for these
> Programming languages as they are specific product from Microsoft and other
> software vendors. This very specific fix was done in Windows 2000 SP2 in the
> infosoft.dll wordbreaker and the fix was carried forward to the Windows
> Server 2003 langwrbk.dll wordbreaker, although I've not tested these single
> letters and case-sensitivity on Win2003. See my many posts on the
> differences between the two OS platform's wordbreaker -
> http://groups.google.com/groups?q=langwrbk+infosoft
>
> In regards to your asp and not "asp.net" SQL FTS query, could you post the
> exact CONTAINS* or FREETEXT* query you are using with a sample of the exact
> content that you are searching against?
>
> Below is an example of using CONTAINSTABLE with a NOT condition:
> -- returns 2 rows when NOT, not included and 0 rows when NOT is included
> SELECT p.pub_id, p.pr_info, c.[rank]
> from pub_info AS p,
> containstable(pub_info, *, '"books" and NOT "publisher"') as c
> where c.[KEY] = p.pub_id
> order by c.[rank]
>
> Regards,
> John
> PS: Could you post your reply as a new thread as this is an old thread that
> will soon roll off the newsgroup server.
>
>
>
>
>
>
> "Richard S Yeo" <Richard S Yeo@discussions.microsoft.com> wrote in message
> news:B906CD4A-96E5-4B7B-B605-5B956FA7171C@microsoft.com...
> > Can anybody help?
> >
> > select @@version
> > Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05
> > Copyright (c) 1988-2003 Microsoft Corporation Enterprise Edition on
> Windows
> > NT 5.2 (Build 3790: )
> >
> > select @@language
> > us_english
> >
> > The fields in the Catalog are specified as English (United States)
> >
> > MSSearch was restarted and full population performed (and population
> > completed) after each config change..
> >
> > With c present in the Noise files (noise.dat and noise.enu) in
> > C:\Program Files\Microsoft SQL Server\MSSQL\FTDATA\SQLServer\Config
> > C:\WINDOWS\system32
> > select count(*) from MyTable where contains(*, 'C')
> > Result = Server: Msg 7619
> >
> > With c removed from the Noise files (noise.dat and noise.enu) in
> > C:\Program Files\Microsoft SQL Server\MSSQL\FTDATA\SQLServer\Config
> > C:\WINDOWS\system32
> >
> > The following tests were carried out replacing C in SQL above. The number
> to
> > the right was the number of rows returned.
> >
> > c = 325
> > C = 325
> > "C" = 325
> > "c" = 325
> >
> > c# = 325
> > C# = 265
> > "C#" = 265
> > "c#" = 325
> >
> > c++ = 696
> > C++ = 548
> > "C++" = 548
> > "c++" = 696
> >
> > It was my understanding that MSSearch was case-insensitive. Any ideas why
> > this is and why c++ returns more rows than c# given c# returns the same
> > number of rows as c?
> >
> > I am also having problems with the following search
> > asp and not "asp.net"
> > in that it returns rows with asp.net in them. is this similar to the
> > c and not c++
> > search issue?
> >
> > Regards
> > Rich
> >
> > "John Kane" wrote:
> >
> > > Actually, C<++>, C<#>, J<++> and J<#> are the "exceptions" as there was
> a very specific fix for these "single letter + punctuation character"
> combinations that was incorporated into Windows 2000 Server SP3 and above by
> Microsoft for specificly searching for these programming languages. However,
> C & J (or any other single letters) were not included in the SP3 fix for the
> Win2K infosoft.dll wordbreaker.
> > >
> > > Below is the detail testing I've done on this subject using SQL Server
> 2000 and Windows 2000 Server with SP4:
> > >
> > > select TextCol from FTSPunc where CONTAINS(TextCol,'"C"') -- Search for
> C with double quotes
> > > -- Returns: Error Msg 7619, "A clause of the query contained only
> ignored words"
> > > -- Expected results: Error 7619 as C is still in the noise word files
> <-- Note, plural noise word files.
> > >
> > > select TextCol from FTSPunc where CONTAINS(TextCol,'C') -- Search for
> C without double quotes
> > > -- Returns: Error Msg 7619, "A clause of the query contained only
> ignored words"
> > > -- Expected results: Error 7619
> > >
> > > select TextCol from FTSPunc where CONTAINS(TextCol,'"C#"') -- Capital C
> Sharp and with double quotes
> > > -- Returns: Smith, John C# CSHARP A.;Roger, David
> > > -- Expected results: Error 7619, but becauase of specific "fix" in Win2K
> SP3, 1 row returned for BOTH upper and lower case c
> > >
> > > select TextCol from FTSPunc where CONTAINS(TextCol,'C#') -- Capital C
> Sharp and without double quotes
> > > -- Returns: Smith, John C# CSHARP A.;Roger, David
> > > -- Expected results: Error 7619, but becauase of specific "fix" in
> Win2K SP3, 1 row returned without double quotes
> > >
> > > select TextCol from FTSPunc where CONTAINS(TextCol,'"C++"') -- Search
> for C++
> > > -- Returns: Smith, Paul C++ CPlus, William A.;Keith, David
> > > -- Expected results: Error 7619, but because of ++ punctuation "next to"
> or "touching" C, it's not a noise letter
> > >
> > > select TextCol from FTSPunc where CONTAINS(TextCol,'"C-"') -- Search
> for C-
> > > -- Returns: Smith, Mary C- CMinus , William A.;Rick, David
> > > -- Expected results: Error 7619, but because of - punctuation "next to"
> or "touching" C, it's not a noise letter
> > >
> > > -- Summary:
> > > -- CONTAINS will find C++ c# C- but Error 7619 with C on Win2K SP3 with
> SQL2K RTM when C is in noise files. Additonally, FREETEXT will find C++ c#
> C- but NOT C without Error 7619 on Win2K SP3 with SQL2K RTM when C is in
> noise files. Note, this is the same summary that I provided Andrew in my
> intial reply to his question.
> > >
> > > Regards,
> > > John
> > > PS: I just wish MS would provide an offical KB article on this issue as
> I'm getting tired of answering this very FAQ as I'm sure you are too! :-)
> > >
> > >
> > > "Hilary Cotter" wrote:
> > >
> > > > c and j are exceptions to this rule IIRC.
> > > >
> > > > --
> > > > Hilary Cotter
> > > > Looking for a book on SQL Server replication?
> > > > http://www.nwsu.com/0974973602.html
> > > >
> > > >
> > > > "John Kane" <jt-kane@comcast.net> wrote in message
> > > > news:OsJihR0aEHA.4092@TK2MSFTNGP11.phx.gbl...
> > > > > Hilary, adding back the single letter 'C" will only cause the error
> Msg
> > > > 7619
> > > > > to reoccur, see the below SQL script for the results on Windows
> Server
> > > > 2003.
> > > > > At this time, I'm not able to re-test this on Windows 2000 Server,
> but see
> > > > > my first reply in this thread to Andy for a summary of results using
> both
> > > > > CONTAINS and FREETEXT on both Win2K and Win2003...
> > > > >
> > > > > Brijesh,
> > > > > While I currently using Windows Server 2003 and can't at this time
> boot
> > > > over
> > > > > to my Windows 2000 Server and test the folowing sql script with that
> OS
> > > > > supplied word breaker. However, the results you are seeing on
> Windows 2000
> > > > > Server with the single letter 'C' removed from noise.enu
> (US_English) is
> > > > the
> > > > > 'by design' results. It may not be what you are expecting, and it is
> not a
> > > > > bug, as Microsoft has indicated that these results with single
> letters are
> > > > > by design.
> > > > >
> > > > > use pubs
> > > > > go
> > > > > SELECT @@version
> > > > > /* -- returns on my server...
> > > > > Microsoft SQL Server 2000 - 8.00.760 (Intel X86)
> > > > > Dec 17 2002 14:22:05
> > > > > Copyright (c) 1988-2003 Microsoft Corporation
> > > > > Enterprise Edition on Windows NT 5.2 (Build 3790: )
> > > > > */
> > > > > go
> > > > >
> > > > > if exists (select * from sysobjects where id = object_id('FTSPunc'))
> > > > > drop table FTSPunc
> > > > > go
> > > > > CREATE TABLE FTSPunc (
> > > > > KeyCol int IDENTITY (1,1) NOT NULL
> > > > > CONSTRAINT FTSPunc_IDX PRIMARY KEY CLUSTERED,
> > > > > TextCol text NULL,
> > > > > VarcharCol varchar(25) NULL,
> > > > > CharCol char(30) NULL,
> > > > > TimeStampCol timestamp NULL
> > > > > ) ON [PRIMARY]
> > > > > go
> > > > >
> > > > > sp_help FTSPunc
> > > > > go
> > > > >
> > > > >
> > > > > -- Insert data...
> > > > > INSERT FTSPunc values('Smith, James C CONLY, William A.;Maloney,
> David',
> > > > > 'Exported from MasterCook', 'Neopolitan Chicken', NULL)
> > > > > INSERT FTSPunc values('Smith, John C# CSHARP A.;Roger, David',
> 'classic
> > > > > recipes', 'Neopolitan Beef', NULL)
> > > > > INSERT FTSPunc values('Smith, Paul C++ CPlus, William A.;Keith,
> David',
> > > > > 'FROZEN NEAPOLITAN TORTE', 'Neapolitan Torte', NULL)
> > > > > INSERT FTSPunc values('Smith, Mary C- CMinus , William A.;Rick,
> David',
> > > > > 'FROZEN NEAPOLITAN TORTE', 'Neapolitan Torte', NULL)
> > > > > go
> > > > >
> > > > > -- Select data, including Timestamp values...
> > > > > SELECT * from FTSPunc
> > > > > go
> > > > >
> > > > > -- Create FT Index
> > > > > exec sp_fulltext_catalog 'FTSPcCatalog','create'
> > > > > exec sp_fulltext_table
> 'FTSPunc','create','FTSPcCatalog','FTSPunc_IDX'
> > > > > exec sp_fulltext_column 'FTSPunc','VarcharCol','add'
> > > > > exec sp_fulltext_column 'FTSPunc','TextCol','add'
> > > > > exec sp_fulltext_column 'FTSPunc','CharCol','add'
> > > > > exec sp_fulltext_table 'FTSPunc', 'activate'
> > > > > go
> > > > >
> > > > > -- Start FT Indexing...
> > > > > exec sp_fulltext_catalog 'FTSPcCatalog','start_full'
> > > > > go
> > > > > -- Wait for FT Indexing to complete and check NT/Win2K Application
> log for
> > > > > success/errors..
> > > > >
> > > > > -- Test FTS...
> > > > > select TextCol from FTSPunc
> > > > > /*
> > > > > TextCol
> > > > > --------------------------------------------------
> > > > > Smith, James C CONLY, William A.;Maloney, David
> > > > > Smith, John C# CSHARP A.;Roger, David
> > > > > Smith, Paul C++ CPlus, William A.;Keith, David
> > > > > Smith, Mary C- CMinus , William A.;Rick, David
> > > > > */
> > > > >
> > > > > -- Note: The following FTS queries were executed with the single
> letter
> > > > 'C'
> > > > > is present in the noise.enu noise word file:
> > > > >
> > > > > select TextCol from FTSPunc where contains(TextCol,'c#') -- simple
> > > > > CONTAINS search word
> > > > > /* -- expected results:
> > > > > Server: Msg 7619, Level 16, State 1, Line 1
> > > > > Execution of a full-text operation failed. A clause of the query
> contained
> > > > > only ignored words.
> > > > > */
> > > > >
> > > > > Select [KEY], RANK,TextCol
> > > > > from ContainsTable(FTSPunc, *, 'c#') S, FTSPunc E
> > > > > Where E.KeyCol = S.[KEY]
> > > > > Order by Rank desc
> > > > > /* -- expected results:
> > > > > Server: Msg 7619, Level 16, State 1, Line 1
> > > > > Execution of a full-text operation failed. A clause of the query
> contained
> > > > > only ignored words.
> > > > > */
> > > > >
> > > > > -- Note: The following FTS queries were executed with the single
> letter
> > > > 'C'
> > > > > removed from the noise.enu.
> > > > > -- 1. Stop the "Microsoft Search" service
> > > > > -- 2. Edit noise.enu file under \FTDATA\SQLServer$SQL2K\Config,
> remove 'c'
> > > > > and save file.
> > > > > -- 3. Re-start the "Microsoft Search" service
> > > > > -- 4. Re-execute the below SQL statement and FTS queries...
> > > > >
> > > > > exec sp_fulltext_catalog 'FTSPcCatalog','start_full'
> > > > >
> > > > > select TextCol from FTSPunc where contains(TextCol,'c#') -- simple
> > > > > CONTAINS search word
> > > > > /* -- expected results on Windows Server 2003
> > > > > TextCol
> > > > > --------------------------------------------------
> > > > > Smith, James C CONLY, William A.;Maloney, David
> > > > > */
> > > > >
> > > > > Select [KEY], RANK,TextCol
> > > > > from ContainsTable(FTSPunc, *, 'c#') S, FTSPunc E
> > > > > Where E.KeyCol = S.[KEY]
> > > > > Order by Rank desc
> > > > > /* -- expected results:
> > > > > KEY RANK TextCol
> > > >
> > ----------- ----------- --------------------------------------------------
> > > > > 1 48 Smith, James C CONLY, William A.;Maloney,
> David
> > > > > */
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > "Hilary Cotter" <hilaryk@att.net> wrote in message
> > > > > news:eRKOSUzaEHA.2408@tk2msftngp13.phx.gbl...
> > > > > > put the c back in your noise word list and rebuild your catalog.
> > > > > >
> > > > > > --
> > > > > > Hilary Cotter
> > > > > > Looking for a book on SQL Server replication?
> > > > > > http://www.nwsu.com/0974973602.html
> > > > > >
> > > > > >
> > > > > > "brijesh Tekrawala" <brijesh@jzero.com> wrote in message
> > > > > > news:e0xZMGyaEHA.3752@TK2MSFTNGP12.phx.gbl...
> > > > > > > Hilary,
> > > > > > >
> > > > > > > it is returning wrong results...
> > > > > > >
> > > > > > > c Programming
> > > > > > > c++ Programming
> > > > > > > C# Programming
> > > > > > > ------------------------------
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > it should return only rows containing C# word.... so only one
> row....
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *** Sent via Developersdex http://www.developersdex.com ***
> > > > > > > Don't just participate in USENET...get rewarded for it!
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
>
>
>



Relevant Pages

  • Re: Luciano Berio
    ... The quotes around retrospective served two very clear and accurate ... If both you and David understood "highlighting" ... "With the New York Philharmonic *highlighting* Luciano Berio in the ...
    (rec.music.classical.recordings)
  • Re: All links to Internet Explorer not working....Solution found
    ... After two full days of searching for a solution, I found it with just two ... David:) ... You click on a hyperlink and an IE window opens but it doesn't load: ... > Are You Ready for WinXP SP2? ...
    (microsoft.public.windows.inetexplorer.ie6_outlookexpress)
  • Searching: PALLETT
    ... I am searching for information on David and Minnie PALLETT. ... Have searched Jewishgen burials South Africa, ... on Jewish Genealogy ...
    (soc.genealogy.jewish)
  • Re: Ive been quoted!
    ... Searching for Neb brings up some quotes that are clearly about me, ... But what about these quotes under the word Cory: ... I can't find it on google groups - but that last one is typical of something ...
    (soc.culture.scottish)
  • Re: Luciano Berio
    ... Second, David was quoting his own post, so the use of quotes was ... If both you and David understood "highlighting" ... Sorry Jeff, a good friend, will gently help him overcome a stubborn mistake, even on matters of intellectual honesty, and not serve as his enabler. ... I find both that both you and David Gable serve this group with valuable information. ...
    (rec.music.classical.recordings)