Re: CONTAINS and non-wildcard asterisk
From: Hilary Cotter (hilaryk_at_att.net)
Date: 08/03/04
- Next message: Hilary Cotter: "Re: Reindex systems tables"
- Previous message: Hilary Cotter: "Re: Full Text Search - How do I best update Index - Merge Replication"
- In reply to: Johnny Nielsen: "Re: CONTAINS and non-wildcard asterisk"
- Next in thread: Johnny Nielsen: "Re: CONTAINS and non-wildcard asterisk"
- Reply: Johnny Nielsen: "Re: CONTAINS and non-wildcard asterisk"
- Messages sorted by: [ date ] [ thread ]
Date: Mon, 2 Aug 2004 21:02:03 -0400
just to clarify further you get the same result searching on test* as you
get searching on test!, or test#.
Here is another stupid SQL FTS trick
try this
select * from tablename where contains(*,'#test*$')
or
select * from tablename where contains(*,'$test*$')
or
select * from tablename where contains(*,'"test*"')
Whatever delimiter you use it still works, even if you mix the delimiter
types.
-- Hilary Cotter Looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html "Johnny Nielsen" <DONT.WRITE.THIStrap402@megabit.dk> wrote in message news:uqCdXGNeEHA.3212@TK2MSFTNGP10.phx.gbl... > OK, thanks for spending your time Hilary. > > Having my observations confirmed I now know how not to proceed :-) > > I really should stop trusting those MS examples... > > > Best regards > Johnny*Nielsen > > > "Hilary Cotter" <hilaryk@att.net> wrote in message > news:O$0froMeEHA.3512@TK2MSFTNGP12.phx.gbl... > > According to my tests with the neutral and US Englishword breaker the * is > > treated as white space when you have a single search argument and you > > wildcard it, and you wrap it in single quotes. > > > > In other words a search on test will return the same hits on a search on > > test*. > > > > If you wrap the phrase with double quotes you will get wildcarding, but it > > will still match with test*. > > > > Think of it this way. While indexing when MSSearch sees non white space, > non > > alphanumeric characters it strips them out and indexes the alphanumeric > > characters. > > > > So test* is indexed as test and is indistinguishable from test. > > > > When you search on test* the * is stripped off and the search is performed > > on test. > > > > Confusing I know. > > > > HTH > > -- > > Hilary Cotter > > Looking for a book on SQL Server replication? > > http://www.nwsu.com/0974973602.html > > > > > > "Johnny Nielsen" <DONT.WRITE.THIStrap402@megabit.dk> wrote in message > > news:%236Y8P9KeEHA.2812@tk2msftngp13.phx.gbl... > > > "Hilary Cotter" <hilaryk@att.net> wrote in message > > > news:%23itsZkJeEHA.3512@TK2MSFTNGP12.phx.gbl... > > > > Yes that is correct. The subtlety is that non alphanumeric characters > > with > > > > some exceptions are treated as white space. > > > > > > ..but as I quoted from the MS online books the asterisk is *not* treated > > as > > > white space when used inside single-quotes - that's my point. To repeat > > the > > > MS statement from the online books, that I originally questioned: [If > the > > > text and asterisk are not delimited by double quotation marks, as in > > > CONTAINS (column, 'text*'), full-text search considers the asterisk as a > > > character and will search for exact matches to text*.] > > > > > > > > > > So test and test* are both indexed as test. So a search on test will > > > return > > > > hits to test and test*. > > > > > > Well, according to MS the asterisk in single-quotes is treated as a > > > character and therefore test and test* should be considered two > different > > > words. That is, contains(*,'test*') should never match the word test. > > > > > > > > > > All text (even ascii) is indexed as its unicode equivalent. Its > > > punctuation > > > > marks which are treated as white space and you get this problem with. > > > > > > Sadly this is rather an additional problem. Building a parser is hard > job > > > when the precise definition of what is considered punctuation symbols > and > > > indexable characters isn't available to the public :-( > > > > > > > > > Best regards > > > Johnny Nielsen > > > > > > > > > > > -- > > > > Hilary Cotter > > > > Looking for a book on SQL Server replication? > > > > http://www.nwsu.com/0974973602.html > > > > > > > > > > > > "Johnny Nielsen" <DONT.WRITE.THIStrap402@megabit.dk> wrote in message > > > > news:u5Jc8EBeEHA.1356@TK2MSFTNGP09.phx.gbl... > > > > > "Hilary Cotter" <hilaryk@att.net> wrote in message > > > > > news:%23I4gCIAeEHA.3132@TK2MSFTNGP11.phx.gbl... > > > > > > No, I am saying Select * from tablename where contains(*,'test*') > > will > > > > > > return hits to test and test*. > > > > > > > > > > I'm sorry Hillary, but now I'm really confused. According to the MS > > > > > documentation that you seem to agree upon the second asterisk in > > > > > CONTAINS(*,'test*') isn't acting as a wildcard but is a part of the > > word > > > > > searched for. > > > > > > > > > > Then how come it will match the word test without an appended > asterisk > > ? > > > > > > > > > > That would be similar to CONTAINS(*,'server') matching the word > serve > > > > > without the appended 'r'. > > > > > > > > > > > > > > > > With Select * from tablename where contains(*,'"test*"') I will > get > > > hits > > > > > to > > > > > > test, test*, and tested as it is now wildcarding > > > > > > > > > > I agree and I have now problem understanding the functionality of > the > > > > > double-quotes - they seem to work as expected :-) > > > > > > > > > > It is only the functionality of the single-quoted cases that > troubles > > > me. > > > > > > > > > > > > > > > > You can't really change the behavior of the wildcarding by joining > > it > > > to > > > > > > other clauses, unless perhaps you do a join to a like that looks > > like > > > > this > > > > > > > > > > > > like '% test %' > > > > > > > > > > OK, I just tried to find out what you meant by "the behavior changes > > if > > > > you > > > > > have multiple search arguements in your search phrase" .. you didn't > > > mean > > > > > that the behavior of the single-quoted term then. > > > > > > > > > > > > > > > PS: This has nothing to do with the current subject, but do you > happen > > > to > > > > > know the valid range of full-text searchable unicode characters ? I > > > can't > > > > > seem to find a clear definition in the MS documentation .. not even > a > > > > > definition of 'punctuation' symbols. > > > > > > > > > > > > > > > Best regards > > > > > Johnny Nielsen > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Hilary Cotter > > > > > > Looking for a book on SQL Server replication? > > > > > > http://www.nwsu.com/0974973602.html > > > > > > > > > > > > > > > > > > "Johnny Nielsen" <DONT.WRITE.THIStrap402@megabit.dk> wrote in > > message > > > > > > news:OXHRNu$dEHA.3864@TK2MSFTNGP10.phx.gbl... > > > > > > > "Hilary Cotter" <hilaryk@att.net> wrote in message > > > > > > > news:Oz0AhW8dEHA.3664@TK2MSFTNGP12.phx.gbl... > > > > > > > > This is correct for a single search argument in your search > > > phrase. > > > > > > > > > > > > > > So, just to be sure I understand you correctly - if you perform > a > > > > SELECT > > > > > * > > > > > > > FROM SOMETABLE WHERE CONTAINS(*,'test*') you will only match > words > > > > > > > containing the five characters 'test*' and not the > four-character > > > word > > > > > > > 'test' ? > > > > > > > > > > > > > > On my system CONTAINS(*,'test*') and CONTAINS(*,' "test*" ') > seems > > > to > > > > > > return > > > > > > > the same results (it treats the asterisk in 'test*' as a > wildcard > > > > which > > > > > is > > > > > > > not what I want). > > > > > > > > > > > > > > > > > > > > > > The behavior changes if you have multiple search arguements in > > > your > > > > > > search > > > > > > > phrase. > > > > > > > > > > > > > > > > > > > > > Can the behavior of the single-quoted simple term 'test*' change > > by > > > > > > joining > > > > > > > it to other arguments ? > > > > > > > > > > > > > > I know the meaning of double-quotes in prefix terms, but I'm > only > > > > > > interested > > > > > > > in the single-quoted non-prefix cases (I'm writing a > > > search-condition > > > > > > parser > > > > > > > which is why the specific workings are important to me). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > With double quotation marks wrapping around your search phrase > > it > > > > will > > > > > > > > wildcard, so results will match the prefix or beginnings (what > > > comes > > > > > > > before > > > > > > > > the *), but will have different suffixes or endings, and it > will > > > > match > > > > > > > with > > > > > > > > words or tokens in your table that has an * at the end. The * > > will > > > > be > > > > > > > > applied to all search arguments. > > > > > > > > > > > > > > > > so a search on "al anon*" will match with al anon, alcoholics > > > > > anonymous, > > > > > > > > allan anont, but not and mike anon*. > > > > > > > > > > > > > > > > If you search without the double quotation marks and you have > > > > multiple > > > > > > > > search arguments > > > > > > > > > > > > > > > > ie > > > > > > > > > > > > > > > > select *from authors where contains(*,'al anon*') > > > > > > > > > > > > > > > > your query will bomb with > > > > > > > > > > > > > > > > Server: Msg 7631, Level 15, State 1, Line 1 > > > > > > > > Syntax error occurred near 'anon*'. Expected ''''' in search > > > > condition > > > > > > 'al > > > > > > > > anon*'. > > > > > > > > > > > > > > > > If you query on a single word or token with no double > quotation > > > > marks > > > > > > you > > > > > > > > will get not wildcarding, but will get an exact match to the > > anon, > > > > and > > > > > > > > anon*. > > > > > > > > > > > > > > > > If you query on a single word or token and wrap it in double > > > > quotation > > > > > > > marks > > > > > > > > you will get wildcarding and it will match with anon and > anon*. > > > > > > > > > > > > > > > > So for a single search word or token in your search phrase you > > > need > > > > to > > > > > > > wrap > > > > > > > > your search phrase in double quotes to get wildcarding. > > > > > > > > > > > > > > > > For multiple search arguments, words or tokens in your search > > > phrase > > > > > you > > > > > > > > need to wrap your search phrase in double quotes to get > > > wildcarding, > > > > > but > > > > > > > all > > > > > > > > search arguments will be wildcarded and you will get matches > to > > > > tokens > > > > > > or > > > > > > > > words in your table ending with the * and matching the search > > > > argument > > > > > > > > exactly. > > > > > > > > > > > > > > > > In FreeText the * will be ignored, but will be return hits to > > > > matches > > > > > to > > > > > > > the > > > > > > > > search arguments which are stemmed or unstemmed, and end in an > * > > > and > > > > > > don't > > > > > > > > end with the *. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Hilary Cotter > > > > > > > > Looking for a book on SQL Server replication? > > > > > > > > http://www.nwsu.com/0974973602.html > > > > > > > > > > > > > > > > > > > > > > > > "Johnny Nielsen" <DONT.WRITE.THIStrap402@megabit.dk> wrote in > > > > message > > > > > > > > news:unzSUC6dEHA.1644@tk2msftngp13.phx.gbl... > > > > > > > > > To quote the MS SQL Server 2000's online book on CONTAINS: > > > > > > > > > > > > > > > > > > "If the text and asterisk are not delimited by double > > quotation > > > > > marks, > > > > > > > as > > > > > > > > in > > > > > > > > > CONTAINS (column, 'text*'), full-text search considers the > > > > asterisk > > > > > as > > > > > > a > > > > > > > > > character and will search for exact matches to text*." > > > > > > > > > > > > > > > > > > Can anybody confirm the above to be true ? > > > > > > > > > > > > > > > > > > It is my experience that MS documentation in general - and > in > > > > > > particular > > > > > > > > > their examples - contains a lot of errors, but here they > > > > > specifically > > > > > > > > > mention that is should be possible to full-text search for > > > 'text*' > > > > > not > > > > > > > > > counting the asterisk as a wildcard. When I try to full-text > > > > search > > > > > > for > > > > > > > > > 'text*' using CONTAINS the result includes records > containing > > > > 'text' > > > > > > > > without > > > > > > > > > the asterisk. > > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards > > > > > > > > > Johnny Nielsen > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best regards > > > > > > > Johnny Nielsen > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
- Next message: Hilary Cotter: "Re: Reindex systems tables"
- Previous message: Hilary Cotter: "Re: Full Text Search - How do I best update Index - Merge Replication"
- In reply to: Johnny Nielsen: "Re: CONTAINS and non-wildcard asterisk"
- Next in thread: Johnny Nielsen: "Re: CONTAINS and non-wildcard asterisk"
- Reply: Johnny Nielsen: "Re: CONTAINS and non-wildcard asterisk"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|