Performance: Query optimizer producing sub-optimal result?

From: Mark Ayliffe (me_at_privacy.net)
Date: 09/24/04


Date: Fri, 24 Sep 2004 16:11:56 GMT

Apologies for the long post, I'm trying to describe my problem as fully as I
can:

As part of a suite of SQL statements I am trying to write a data integrity
validation statement[2] to check the consistency of a legacy unique ID[1]. I
have managed to produce a working solution basued on a user defined function
which validates a set of unique IDs in a small group (2 to 100) of related
records. My problem is that almost whatever I try, the Query optimizer is
reorganising things so that my function is called more frequently than
necessary. The function declares and opens a cursor on to the table so that
I can navigate between the records checking each row as I go, so of course
it has a non-trivial cost. I've tried several different approaches:

A.
SELECT UID, fnCheck(UID)
FROM (SELECT DISTINCT MainPart from myTable)
WHERE fnCheck(UID) <> 0
[3]

The optimizer seems to move the "DISTINCT" action in the subquery to the end
of the process. Using the Server Trace shows that fnCheck(UID) is called
once per record in MyTable, not once per DISTINCT MainPart as I'd expected.
OK, on to...

B. Define a View which performs the SELECT DISTINCT part of option A. My SQL
statement becomes:

SELECT UID, fnCheck(UID)
FROM MAINUIDVIEW
WHERE fnCheck(UID) <> 0

Amazingly this *still* calls fnCheck(UID) once per record in the base table,
not only once per row in the view.

C. Take the contents out of the function and make the whole check an inline
procedure, making the cursor include the whole table and cope with the
switch of MainPart as we proceed. This appears to be as slow as the original
version, although it doesn't have the unnecessary repeats. I suspect that
the cursor on a large table (typically the table involved can have between
10's of thousands of records to a few million) is making the process rather
slow.

D. Use a temporary table, put the DISTINCT main parts in there and use that
in the query. This is my current method, but it seems a little over the top
for what I'm trying to do. And though the table generation is very quick,
the whole process is still disappointingly slow. This is why I'd like the
function calls to be minimised.

Obviously my (scalar) function is rather expensive and the Optimizer isn't
able to take this into account. Have I missed something obvious here? Is
there a better technique I could be using? I've tried fiddling with the
NOEXPAND view hint, but as I don't have an index on the view (I think) SQL
Server doesn't accept it. Are there other ways that I can bend the Optimizer
to my will :-)

Thanks for reading this far!

Mark

[1] The Unique Id takes the form: [Main part][sep][Minor part]. A related
set of records have the same [Main part], the (single character) [sep] is
the same for all but the last record in the set and the [Minor part] is a
"stringified" number, starting at 1 and increasing as necessary. A valid set
of UIDs might look like:
Wibble-0001
Wibble-0002
Wibble=0003
This format has been in use in our systems for well over a decade, I would
have done it differently, but I'm stuck with it.

[2] I need to validate:
 - Valid characters only used in each position of the string. This seems to
work fine in a separate function and isn't really part of my question.
 - [Minor part] starts at 1 and increments sequentially
 - [sep] is the correct value for each record in a group, the highest
numbered [Minor Part] has the "terminating" separator.

[3] There are a couple of short where clauses which rule out some forms of
UID which I don't want to check. I've left these out for clarity.

-- 
 Real email address  |
     is mark at      |   What happens if you get scared half to death twice?
   ayliffe dot org   |


Relevant Pages