Re: Performance: Query optimizer producing sub-optimal result?

From: Steve Kass (skass_at_drew.edu)
Date: 09/25/04


Date: Sat, 25 Sep 2004 16:29:12 -0400

Mark,

  The simplified query you show as A. isn't a valid query, so it's hard
to suggest how you might rewrite it so that the function is called only
once per UID. It should be possible, though, by making sure the
function argument depends on something the query processor can't
recognize as known until after the aggregation for DISTINCT takes
place. Here's an example of how to do it using the Northwind database:

-- This query executes dbo.f for each row of Orders
select CustomerID, dbo.f(CustomerID)
from (
  select distinct CustomerID
  from Northwind..Orders
) T
where dbo.f(CustomerID) > 73
go

-- This query executes dbo.f once for each distinct CustomerID
select CustomerID, dbo.f(CustomerID)
from (
  select CustomerID, max('') as hack
  from Northwind..Orders
  group by CustomerID
) T
where dbo.f(CustomerID+hack) > 73

There are also a number of other solutions to the problem of
accommodating legacy information, such as keeping the legacy rows in a
separate table, subject to CHECK constraints or triggers, or using
referential integrity, indexed views, or computed columns creatively.
Given what you've said, I think one option might be an indexed view over
something like

select MainPart, <some aggregates>
from myTable
where <MainPart is a legacy code>
group by MainPart

If you'd like to provide more specific information, maybe we can offer a
more specific answer. Your footnotes suggest there is an efficient
solution, but it would help to see the definition of fnCheck.

Steve Kass
Drew University

Mark Ayliffe wrote:

>Apologies for the long post, I'm trying to describe my problem as fully as I
>can:
>
>As part of a suite of SQL statements I am trying to write a data integrity
>validation statement[2] to check the consistency of a legacy unique ID[1]. I
>have managed to produce a working solution basued on a user defined function
>which validates a set of unique IDs in a small group (2 to 100) of related
>records. My problem is that almost whatever I try, the Query optimizer is
>reorganising things so that my function is called more frequently than
>necessary. The function declares and opens a cursor on to the table so that
>I can navigate between the records checking each row as I go, so of course
>it has a non-trivial cost. I've tried several different approaches:
>
>A.
>SELECT UID, fnCheck(UID)
>FROM (SELECT DISTINCT MainPart from myTable)
>WHERE fnCheck(UID) <> 0
>[3]
>
>The optimizer seems to move the "DISTINCT" action in the subquery to the end
>of the process. Using the Server Trace shows that fnCheck(UID) is called
>once per record in MyTable, not once per DISTINCT MainPart as I'd expected.
>OK, on to...
>
>B. Define a View which performs the SELECT DISTINCT part of option A. My SQL
>statement becomes:
>
>SELECT UID, fnCheck(UID)
>FROM MAINUIDVIEW
>WHERE fnCheck(UID) <> 0
>
>Amazingly this *still* calls fnCheck(UID) once per record in the base table,
>not only once per row in the view.
>
>C. Take the contents out of the function and make the whole check an inline
>procedure, making the cursor include the whole table and cope with the
>switch of MainPart as we proceed. This appears to be as slow as the original
>version, although it doesn't have the unnecessary repeats. I suspect that
>the cursor on a large table (typically the table involved can have between
>10's of thousands of records to a few million) is making the process rather
>slow.
>
>D. Use a temporary table, put the DISTINCT main parts in there and use that
>in the query. This is my current method, but it seems a little over the top
>for what I'm trying to do. And though the table generation is very quick,
>the whole process is still disappointingly slow. This is why I'd like the
>function calls to be minimised.
>
>Obviously my (scalar) function is rather expensive and the Optimizer isn't
>able to take this into account. Have I missed something obvious here? Is
>there a better technique I could be using? I've tried fiddling with the
>NOEXPAND view hint, but as I don't have an index on the view (I think) SQL
>Server doesn't accept it. Are there other ways that I can bend the Optimizer
>to my will :-)
>
>Thanks for reading this far!
>
>Mark
>
>
>[1] The Unique Id takes the form: [Main part][sep][Minor part]. A related
>set of records have the same [Main part], the (single character) [sep] is
>the same for all but the last record in the set and the [Minor part] is a
>"stringified" number, starting at 1 and increasing as necessary. A valid set
>of UIDs might look like:
>Wibble-0001
>Wibble-0002
>Wibble=0003
>This format has been in use in our systems for well over a decade, I would
>have done it differently, but I'm stuck with it.
>
>[2] I need to validate:
> - Valid characters only used in each position of the string. This seems to
>work fine in a separate function and isn't really part of my question.
> - [Minor part] starts at 1 and increments sequentially
> - [sep] is the correct value for each record in a group, the highest
>numbered [Minor Part] has the "terminating" separator.
>
>[3] There are a couple of short where clauses which rule out some forms of
>UID which I don't want to check. I've left these out for clarity.
>
>
>



Relevant Pages

  • Re: Query is failing in Runtime
    ... Assuming that the unique field is CustomerID, and that is really the only ... I have a query that I call to fill a list box. ... strOrder = " ORDER BY LastFirst" ... strOrder = " ORDER BY Address, ...
    (microsoft.public.access.queries)
  • RE: need similar solution to "cascading combos"...
    ... Is there a way in code to access the subform's recordset in a query. ... So let's say that Combo1 has a query that includes CustomerID and ... This will cause Combo2 to filter it's data based on the value in Combo1 ... boxes, I used several unbound combo boxes for Customer Name, Product Name, ...
    (microsoft.public.access.forms)
  • Re: Syntax error (missing operator) in query expression
    ... you can dynamically create a pass-through query in code. ... is it possible to run stored procedure as ... @fromDate datetime ... GROUP BY customerID) qDRI2 ...
    (microsoft.public.access.queries)
  • Re: Ignore duplicate records
    ... It slows down when you decide the performance is too slow. ... Then you use that query and the customers table to build an unmatched query ... "John Spencer" wrote: ... SELECT CustomerID ...
    (microsoft.public.access.queries)
  • RE: Create report with three rows of data representing years
    ... I would scratch your current monster query and build a union ... SELECT CustomerID, Jan05 as Sales, #1/1/2005# as SalesDate ... table with a union query and then create a crosstab based on your union query. ... I have a fairly complex report that I am trying to design. ...
    (microsoft.public.access.reports)