Re: Indexing Service and hyphens

From: Hammad (Hammad_at_discussions.microsoft.com)
Date: 02/18/05


Date: Fri, 18 Feb 2005 06:29:03 -0800

Hi John,

Thanks for your quick response. The version I obtained from using that
command is the following:

Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05
Copyright (c) 1988-2003 Microsoft Corporation Developer Edition on Windows
NT 5.1 (Build 2600: Service Pack 2)

I've done a little bit of reading on word breakers but I'm not sure how to
actually configure programatically a word breaker to use for indexing or
whether this is even necessary. I'm not exactly sure how the indexing
service works but I assume if it finds a word e-business in a document, it
will index e, business, ebusiness, and e-business, because when I do use the
CissoQuery object and specify the exact phrase "e-business" using Dialect 2,
it does find it. The only issue I have is how to specify a wildcard type
search such that if I type in "e-bus" it will find all variations of words
with e-bus as a prefix. If I don't specify quotes around e-business then it
will find documents that contain variations of e-business like I detailed
previously, so documents that don't have e-business in them actually show up
because they have those variations. If I specify just "e-bus" in quotes
then it looks for the exact phrase and not prefix based words and so it won't
find documents that contain that variations of words that start with that
prefix. Is it possible to do such a thing?

Thanks,

Hammad

"John Kane" wrote:

> Hammad,
> It might be best to post this question to
> microsoft.public.sqlserver.fulltext or
> microsoft.public.inetserver.indexserver newsgroups as this is a somewhat
> specialized area...
>
> The Indexing Service (IS) uses the same OS-supplied word breakers that
> determine the language specific breaking of words into tokens. For example,
> using a URL, such as 'http://jtkane.com?search=what#is#my+name' that
> includes punctuation characters such as :, /, ?, =, and + will be tokenized
> as follows on Windows Server 2003 and Windows XP using the LangWrbk.dll
> wordbreaker:
>
> Original text: 'http://jtkane.com?search=what#is#my+name'
> IWordSink::PutWord: cwcSrcLen 4, cwcSrcPos 0, cwc 4, 'http'
> IWordSink::PutWord: cwcSrcLen 6, cwcSrcPos 7, cwc 6, 'jtkane'
> IWordSink::PutWord: cwcSrcLen 3, cwcSrcPos 14, cwc 3, 'com'
> IWordSink::PutWord: cwcSrcLen 6, cwcSrcPos 18, cwc 6, 'search'
> IWordSink::PutWord: cwcSrcLen 4, cwcSrcPos 25, cwc 4, 'what'
> IWordSink::PutWord: cwcSrcLen 2, cwcSrcPos 30, cwc 2, 'is'
> IWordSink::PutWord: cwcSrcLen 2, cwcSrcPos 33, cwc 2, 'my'
> IWordSink::PutWord: cwcSrcLen 4, cwcSrcPos 36, cwc 4, 'name'
>
> However, on Windows 2000 Server the same URL will be tokenized as a single
> token using the infosoft.dll wordbreaker:
>
> Original text: 'http://jtkane.com?search=what#is#my+name'
> IWordSink::PutWord: cwcSrcLen 40, cwcSrcPos 0, cwc 39,
> 'http://jtkane.com?searchwhat#is#my+name'
>
> The same is true for SQL Server's Full Text Search (FTS) component as is
> true for the Indexing Service as both depend upon the OS-supplied
> wordbreakers. Could you post the full output of -- SELECT @@version -- as
> this would be most helpful in troubleshooting your questions.
>
> Thanks,
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
>
>
> "Hammad" <Hammad@discussions.microsoft.com> wrote in message
> news:80C23DCB-6475-4555-93D8-DD30DF5EA337@microsoft.com...
> > I am trying to search for a word such as "e-business" using the Indexing
> > Service Query object (CissoQuery). Now what I would like to do is to be
> able
> > to search for e-bus and return results of variations of this term, e.g.
> > e-business, e-busi. So effectively, I would like to a do a wildcard
> search.
> > Unfortunately, when I search for this term, it returns to me documents
> that
> > do not have e-business in them but variations of e (I have modified the
> noise
> > list to remove noise words) and business as well as ebusiness. I don't
> want
> > this to happen. I can search for the phrase "e-business" and it returns
> the
> > correct results back. However if I search for "e-bus" it returns no
> results
> > back because it is looking for the entire phrase. If I search for
> e-business
> > without the quotes, I get the variations of which I talked about earlier
> for
> > documents that don't contain that phrase. How do I configure Indexing
> > Service to return me results with hyphens back. I have yet to find any
> > answer on the web anywhere where this question has been asked
> sufficiently.
> > If this is a bug and cannot be done in indexing service, please tell me
> and I
> > will stop attempting to try and figure this out. I am aware that this is
> a
> > general indexing service question but I know sql server uses the service
> > internally or something like it, so I am posting this question to this
> > newsgroup.
>
>
>



Relevant Pages

  • [NT] Vulnerability in Indexing Service Allows Cross-Site Scripting (MS06-053)
    ... The following security advisory is sent to the securiteam mailing list, and can be found at the SecuriTeam web site: http://www.securiteam.com ... There is an information disclosure vulnerability in the Indexing Service ... * Microsoft Windows XP Service Pack 1 and Microsoft Windows XP Service ... Internet Information Services is not installed on ...
    (Securiteam)
  • Re: Windows Explorer File Search
    ... To configure Windows XP to search all files no matter what the file type, ... Click Change Indexing Service Settings. ... Warning Serious problems might occur if you modify the registry incorrectly ... set the FilterFilesWithUnknownExtensions DWORD value to 1 in the ...
    (microsoft.public.windowsxp.basics)
  • Re: Search
    ... To configure Windows XP to search all files no matter what the file type, ... file types with unknown extensions option. ... Click Change Indexing Service Settings. ... Warning Serious problems might occur if you modify the registry incorrectly ...
    (microsoft.public.windowsxp.general)
  • Re: Slow System
    ... Where are you with regard to the Indexing Service -see below? ... may be especially noticed by those upgrading to Windows XP from an earlier ... Are there any error messages in Event Viewer? ... > Microsoft had told me that this machine could run XP Pro ...
    (microsoft.public.windowsxp.perform_maintain)
  • Re: cidaemon - Application Error
    ... I don't know why you would, but go back to Add or Remove Windows Components, ... >> Uninstall and disable the Indexing Service instead. ... >> Before disabling the Indexing Service, uninstall it in Add or Remove ... >> [[Indexes contents and properties of files on local and remote computers; ...
    (microsoft.public.windowsxp.help_and_support)