Re: What characters are considered as word boudaries



Its not configurable. Basically all non alpha numeric characters are not
indexed and are considered to be work boundaries. There are some
exceptions - handling of - in different languages, and the . If you have
something like F.B.I. it is indexed as F.B.I. and FBI. f.b.i. is broken into
f, b, and i.

Same with + and # after upper case characters, C# is indexed a C#, c#, is
indexed as c. $10.00 is indexed as $10.00, whereas $ is indexed as test.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



"yuan" <yuan@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:C6B39E6B-AC72-4F1A-AC76-D07B63AFD913@xxxxxxxxxxxxxxxx
Hi

I would like to know what are the caracters that are considered as word
boundaries in FTS and if there is a way to configure this list of
characters.

My question is about the '/' character which doesn't seem to be a word
boundary in FTS and I would like to change this behavior.

Thank's


.



Relevant Pages

  • Re: Word 2003 - different pagination on different documents
    ... past the boundary the spacing between all the characters on that line ... I tried to change styles of the paragraph to something else and back. ... I checked the font properties. ... Boundaries in the user's 2 page doc I can see the last word in the ...
    (microsoft.public.word.docmanagement)
  • Re: Word 2003 - different pagination on different documents
    ... past the boundary the spacing between all the characters on that line ... I tried to change styles of the paragraph to something else and back. ... I checked the font properties. ... Boundaries in the user's 2 page doc I can see the last word in the ...
    (microsoft.public.word.docmanagement)
  • Re: Regexp: m and [^[:alnum:]_] are not equivalent
    ... beginning or end of a string. ... > and non-word characters, and searching whole words doesn't make sense. ... this exact text must be bounded by what I called "word boundaries" ... Therefore I'm looking for a regexp that will ...
    (comp.lang.tcl)
  • Re: Removing Space
    ... Based on what Harlan said, dont set a boundary, clear any boundary ... of 8 characters, and if based on 8 characters, those with 14 characters ... Width, click Next, clear all the field boundaries, then click ... Bryan Hessey ...
    (microsoft.public.excel.misc)

Loading