Re: CONTAINS and non-wildcard asterisk

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Hilary Cotter (hilaryk_at_att.net)
Date: 08/03/04


Date: Mon, 2 Aug 2004 21:02:03 -0400

just to clarify further you get the same result searching on test* as you
get searching on test!, or test#.

Here is another stupid SQL FTS trick

try this

select * from tablename where contains(*,'#test*$')

or

select * from tablename where contains(*,'$test*$')

or

select * from tablename where contains(*,'"test*"')

Whatever delimiter you use it still works, even if you mix the delimiter
types.

-- 
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Johnny Nielsen" <DONT.WRITE.THIStrap402@megabit.dk> wrote in message
news:uqCdXGNeEHA.3212@TK2MSFTNGP10.phx.gbl...
> OK, thanks for spending your time Hilary.
>
> Having my observations confirmed I now know how not to proceed :-)
>
> I really should stop trusting those MS examples...
>
>
> Best regards
> Johnny*Nielsen
>
>
> "Hilary Cotter" <hilaryk@att.net> wrote in message
> news:O$0froMeEHA.3512@TK2MSFTNGP12.phx.gbl...
> > According to my tests with the neutral and US Englishword breaker the *
is
> > treated as white space when you have a single search argument and you
> > wildcard it, and you wrap it in single quotes.
> >
> > In other words a search on test will return the same hits on a search on
> > test*.
> >
> > If you wrap the phrase with double quotes you will get wildcarding, but
it
> > will still match with test*.
> >
> > Think of it this way. While indexing when MSSearch sees non white space,
> non
> > alphanumeric characters it strips them out and indexes the alphanumeric
> > characters.
> >
> > So test* is indexed as test and is indistinguishable from test.
> >
> > When you search on test* the * is stripped off and the search is
performed
> > on test.
> >
> > Confusing I know.
> >
> > HTH
> > -- 
> > Hilary Cotter
> > Looking for a book on SQL Server replication?
> > http://www.nwsu.com/0974973602.html
> >
> >
> > "Johnny Nielsen" <DONT.WRITE.THIStrap402@megabit.dk> wrote in message
> > news:%236Y8P9KeEHA.2812@tk2msftngp13.phx.gbl...
> > > "Hilary Cotter" <hilaryk@att.net> wrote in message
> > > news:%23itsZkJeEHA.3512@TK2MSFTNGP12.phx.gbl...
> > > > Yes that is correct. The subtlety is that non alphanumeric
characters
> > with
> > > > some exceptions are treated as white space.
> > >
> > > ..but as I quoted from the MS online books the asterisk is *not*
treated
> > as
> > > white space when used inside single-quotes - that's my point. To
repeat
> > the
> > > MS statement from the online books, that I originally questioned: [If
> the
> > > text and asterisk are not delimited by double quotation marks, as in
> > > CONTAINS (column, 'text*'), full-text search considers the asterisk as
a
> > > character and will search for exact matches to text*.]
> > >
> > >
> > > > So test and test* are both indexed as test. So a search on test will
> > > return
> > > > hits to test and test*.
> > >
> > > Well, according to MS the asterisk in single-quotes is treated as a
> > > character and therefore test and test* should be considered two
> different
> > > words. That is, contains(*,'test*') should never match the word test.
> > >
> > >
> > > > All text (even ascii) is indexed as its unicode equivalent. Its
> > > punctuation
> > > > marks which are treated as white space and you get this problem
with.
> > >
> > > Sadly this is rather an additional problem. Building a parser is hard
> job
> > > when the precise definition of what is considered punctuation symbols
> and
> > > indexable characters isn't available to the public :-(
> > >
> > >
> > > Best regards
> > > Johnny Nielsen
> > >
> > > >
> > > > -- 
> > > > Hilary Cotter
> > > > Looking for a book on SQL Server replication?
> > > > http://www.nwsu.com/0974973602.html
> > > >
> > > >
> > > > "Johnny Nielsen" <DONT.WRITE.THIStrap402@megabit.dk> wrote in
message
> > > > news:u5Jc8EBeEHA.1356@TK2MSFTNGP09.phx.gbl...
> > > > > "Hilary Cotter" <hilaryk@att.net> wrote in message
> > > > > news:%23I4gCIAeEHA.3132@TK2MSFTNGP11.phx.gbl...
> > > > > > No, I am saying Select * from tablename where
contains(*,'test*')
> > will
> > > > > > return hits to test and test*.
> > > > >
> > > > > I'm sorry Hillary, but now I'm really confused. According to the
MS
> > > > > documentation that you seem to agree upon the second asterisk in
> > > > > CONTAINS(*,'test*') isn't acting as a wildcard but is a part of
the
> > word
> > > > > searched for.
> > > > >
> > > > > Then how come it will match the word test without an appended
> asterisk
> > ?
> > > > >
> > > > > That would be similar to CONTAINS(*,'server') matching the word
> serve
> > > > > without the appended 'r'.
> > > > >
> > > > >
> > > > > > With Select * from tablename where contains(*,'"test*"') I will
> get
> > > hits
> > > > > to
> > > > > > test, test*, and tested as it is now wildcarding
> > > > >
> > > > > I agree and I have now problem understanding the functionality of
> the
> > > > > double-quotes - they seem to work as expected :-)
> > > > >
> > > > > It is only the functionality of the single-quoted cases that
> troubles
> > > me.
> > > > >
> > > > >
> > > > > > You can't really change the behavior of the wildcarding by
joining
> > it
> > > to
> > > > > > other clauses, unless perhaps you do a join to a like that looks
> > like
> > > > this
> > > > > >
> > > > > > like '% test %'
> > > > >
> > > > > OK, I just tried to find out what you meant by "the behavior
changes
> > if
> > > > you
> > > > > have multiple search arguements in your search phrase" .. you
didn't
> > > mean
> > > > > that the behavior of the single-quoted term then.
> > > > >
> > > > >
> > > > > PS: This has nothing to do with the current subject, but do you
> happen
> > > to
> > > > > know the valid range of full-text searchable unicode characters ?
I
> > > can't
> > > > > seem to find a clear definition in the MS documentation .. not
even
> a
> > > > > definition of 'punctuation' symbols.
> > > > >
> > > > >
> > > > > Best regards
> > > > > Johnny Nielsen
> > > > >
> > > > >
> > > > > >
> > > > > > -- 
> > > > > > Hilary Cotter
> > > > > > Looking for a book on SQL Server replication?
> > > > > > http://www.nwsu.com/0974973602.html
> > > > > >
> > > > > >
> > > > > > "Johnny Nielsen" <DONT.WRITE.THIStrap402@megabit.dk> wrote in
> > message
> > > > > > news:OXHRNu$dEHA.3864@TK2MSFTNGP10.phx.gbl...
> > > > > > > "Hilary Cotter" <hilaryk@att.net> wrote in message
> > > > > > > news:Oz0AhW8dEHA.3664@TK2MSFTNGP12.phx.gbl...
> > > > > > > > This is correct for a single search argument in your search
> > > phrase.
> > > > > > >
> > > > > > > So, just to be sure I understand you correctly - if you
perform
> a
> > > > SELECT
> > > > > *
> > > > > > > FROM SOMETABLE WHERE CONTAINS(*,'test*') you will only match
> words
> > > > > > > containing the five characters 'test*' and not the
> four-character
> > > word
> > > > > > > 'test' ?
> > > > > > >
> > > > > > > On my system CONTAINS(*,'test*') and CONTAINS(*,' "test*" ')
> seems
> > > to
> > > > > > return
> > > > > > > the same results (it treats the asterisk in 'test*' as a
> wildcard
> > > > which
> > > > > is
> > > > > > > not what I want).
> > > > > > >
> > > > > > >
> > > > > > > > The behavior changes if you have multiple search arguements
in
> > > your
> > > > > > search
> > > > > > > phrase.
> > > > > > >
> > > > > > >
> > > > > > > Can the behavior of the single-quoted simple term 'test*'
change
> > by
> > > > > > joining
> > > > > > > it to other arguments ?
> > > > > > >
> > > > > > > I know the meaning of double-quotes in prefix terms, but I'm
> only
> > > > > > interested
> > > > > > > in the single-quoted non-prefix cases (I'm writing a
> > > search-condition
> > > > > > parser
> > > > > > > which is why the specific workings are important to me).
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > With double quotation marks wrapping around your search
phrase
> > it
> > > > will
> > > > > > > > wildcard, so results will match the prefix or beginnings
(what
> > > comes
> > > > > > > before
> > > > > > > > the *), but will have different suffixes or endings, and it
> will
> > > > match
> > > > > > > with
> > > > > > > > words or tokens in your table that has an * at the end. The
*
> > will
> > > > be
> > > > > > > > applied to all search arguments.
> > > > > > > >
> > > > > > > > so a search on "al anon*" will match with al anon,
alcoholics
> > > > > anonymous,
> > > > > > > > allan anont, but not and mike anon*.
> > > > > > > >
> > > > > > > > If you search without the double quotation marks and you
have
> > > > multiple
> > > > > > > > search arguments
> > > > > > > >
> > > > > > > > ie
> > > > > > > >
> > > > > > > > select *from authors where contains(*,'al anon*')
> > > > > > > >
> > > > > > > > your query will bomb with
> > > > > > > >
> > > > > > > > Server: Msg 7631, Level 15, State 1, Line 1
> > > > > > > > Syntax error occurred near 'anon*'. Expected ''''' in search
> > > > condition
> > > > > > 'al
> > > > > > > > anon*'.
> > > > > > > >
> > > > > > > > If you query on a single word or token with no double
> quotation
> > > > marks
> > > > > > you
> > > > > > > > will get not wildcarding, but will get an exact match to the
> > anon,
> > > > and
> > > > > > > > anon*.
> > > > > > > >
> > > > > > > > If you query on a single word or token and wrap it in double
> > > > quotation
> > > > > > > marks
> > > > > > > > you will get wildcarding and it will match with anon and
> anon*.
> > > > > > > >
> > > > > > > > So for a single search word or token in your search phrase
you
> > > need
> > > > to
> > > > > > > wrap
> > > > > > > > your search phrase in double quotes to get wildcarding.
> > > > > > > >
> > > > > > > > For multiple search arguments, words or tokens in your
search
> > > phrase
> > > > > you
> > > > > > > > need to wrap your search phrase in double quotes to get
> > > wildcarding,
> > > > > but
> > > > > > > all
> > > > > > > > search arguments will be wildcarded and you will get matches
> to
> > > > tokens
> > > > > > or
> > > > > > > > words in your table ending with the * and matching the
search
> > > > argument
> > > > > > > > exactly.
> > > > > > > >
> > > > > > > > In FreeText the * will be ignored, but will be return hits
to
> > > > matches
> > > > > to
> > > > > > > the
> > > > > > > > search arguments which are stemmed or unstemmed, and end in
an
> *
> > > and
> > > > > > don't
> > > > > > > > end with the *.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > -- 
> > > > > > > > Hilary Cotter
> > > > > > > > Looking for a book on SQL Server replication?
> > > > > > > > http://www.nwsu.com/0974973602.html
> > > > > > > >
> > > > > > > >
> > > > > > > > "Johnny Nielsen" <DONT.WRITE.THIStrap402@megabit.dk> wrote
in
> > > > message
> > > > > > > > news:unzSUC6dEHA.1644@tk2msftngp13.phx.gbl...
> > > > > > > > > To quote the MS SQL Server 2000's online book on CONTAINS:
> > > > > > > > >
> > > > > > > > > "If the text and asterisk are not delimited by double
> > quotation
> > > > > marks,
> > > > > > > as
> > > > > > > > in
> > > > > > > > > CONTAINS (column, 'text*'), full-text search considers the
> > > > asterisk
> > > > > as
> > > > > > a
> > > > > > > > > character and will search for exact matches to text*."
> > > > > > > > >
> > > > > > > > > Can anybody confirm the above to be true ?
> > > > > > > > >
> > > > > > > > > It is my experience that MS documentation in general - and
> in
> > > > > > particular
> > > > > > > > > their examples - contains a lot of errors, but here they
> > > > > specifically
> > > > > > > > > mention that is should be possible to full-text search for
> > > 'text*'
> > > > > not
> > > > > > > > > counting the asterisk as a wildcard. When I try to
full-text
> > > > search
> > > > > > for
> > > > > > > > > 'text*' using CONTAINS the result includes records
> containing
> > > > 'text'
> > > > > > > > without
> > > > > > > > > the asterisk.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best regards
> > > > > > > > > Johnny Nielsen
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best regards
> > > > > > > Johnny Nielsen
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>


Relevant Pages

  • Re: opinions on password policies
    ... On Mon, 15 Jan 2001, David Talkington wrote: ... Pick a very long phrase or ... Now choose 8 or 10 characters from it at ... > PGP key: http://www.prairienet.org/~dtalk/dt000823.asc ...
    (FreeBSD-Security)
  • Re: ALERT: WPA-TKIP isnt secure - use WPA2 instead
    ... USE WPA-AES or WPA2 instead of WPA-TKIP ... USE A PASSPHRASE WITH MORE THAN 20 CHARACTERS. ... remember this phrase, type it in more than once, or otherwise make it ... I once tested this method various password crackers. ...
    (alt.internet.wireless)
  • Re: Distinguishing hard spaces from regular spaces
    ... Just change to .MatchWildcards = True. ... A wildcard search using a space character in the Find.Text expression won't ... There's a bug that reverses the replacement when you use the \1 code, ... characters need an unbreakable space before ...
    (microsoft.public.word.vba.general)
  • Optimal wildcard search algorithm
    ... whether or not one or more users exist, given some wildcard expression. ... but we'll focus on username brute-forcing for this ... One approach to finding all usernames would be a kind of breadth-first ... and determine which characters exist in the first position. ...
    (Pen-Test)
  • Re: Router problems.
    ... It's somewhere in the web based Belkin configuration for the router. ... characters for the WPA pass phrase. ... Conventional wisdom suggest 20 characters ...
    (alt.internet.wireless)