Re: LIKE or CHARINDEX algoritms

From: Steve Kass (skass_at_drew.edu)
Date: 02/09/05


Date: Tue, 08 Feb 2005 23:45:10 -0500

Kuido,

  I don't think the engine is using Boyer-Moore or KMP, but my
guess is that writing your own matching algorithm in T-SQL will
not be much help, unless there is something very particular about
the data that lets you know that naive string matching will often
hit its worst case of nm (small alphabet, long strings, mismatches
rarely near the beginning of the pattern). T-SQL is basically
processed as an interpreted language, and whatever you do will
have an order of magnitude or more penalty over the pre-compiled
routines used by LIKE and CHARINDEX, which were written in
C.

On the other hand, when SQL Server 2005 is released, you may
benefit by writing these in one of the CLR languages.

Here's a little test program that suggests to me that whatever is
being used behind the scenes is not much better than a naive algorithm:

create table T (
  s varchar(4000)
)
insert into T
select
  replicate('WA',500)
from Northwind..Orders
go

declare @t datetime
set @t = getdate()

declare @c int set @c = 0
declare @times int set @times = 15
while @times > 0 begin
  select @c = @c + count(*) from T
  where S LIKE '%WAWAWAWAWAWAWAWAWAWAWAWAWAWA%'
  set @times = @times - 1
end
select @c, datediff(ms, @t, getdate())
go
go

declare @t datetime
set @t = getdate()

declare @c int set @c = 0
declare @times int set @times = 15
while @times > 0 begin
  select @c = @c + count(*) from T
  where S LIKE '%1WAWAWAWAWAWAWAWAWAWAWAWAW%'
  set @times = @times - 1
end
select @c, datediff(ms, @t, getdate())
go
go

declare @t datetime
set @t = getdate()

declare @c int set @c = 0
declare @times int set @times = 15
while @times > 0 begin
  select @c = @c + count(*) from T
  where S LIKE
'%WAWAWAWAWAWAWAWAWAWAWAWAW1234512345123451234512345123451234512345%'
  set @times = @times - 1
end
select @c, datediff(ms, @t, getdate())
GO

GO
drop table T

-- Steve Kass
-- Drew University

Kuido K?lm via SQLMonster.com wrote:

>Hello !
>Does nobody in which algorithm the LIKE and CHARINDEX function based.
>
>Is it meaningful to write itself search algoritms based on
>Knuth-Morris-Pratt or Boyer-Moore algoritms to replace LIKE on CHARINDEX
>functions for speed exhancement ? (I itself not belive)
>
>Is the some ways to speed up LIKE searches ?
>
>Kuido
>
>
>