Re: Design Question - Accounts/Transactions



The indexed view can't return anything about individual transactions. The
reason is that the indexed view contains only one row per account. That row
constains only the account_id, the total of the transaction details for that
account and a count of the transaction details for that account. So, it is
helpful in getting the sum of ALL transaction details the account, but here,
you want to return data about individual transactions, so that indexed view
can't help.

The query you are running is looking for the the first 100 transactions for
account @account_id sorted by transaction_date desc. But the only index you
have on the transaction table (assuming I found everything correctly from
this thread) is a clustered index on transaction_id. That index is no help
in finding the accounts. This is somewhat the equivalent of me handing you
the New York City phone book and asking you to find all people who have a
phone number between 800-555-0000 and 800-555-9999. It would take you a
long time. But if you had a reverse phone book, one that listed all the
phone numbers in phone number order, you could find the answer quickly. In
the same manner, you want an index that will help this query. But with the
index you have, the only way a query that wants the most recent 100
transactions for a given account can operate is first read all 1,000,000
rows to find the 300,000+ rows that go with this given account. As bad, or
actually worse, after it gets the 300,000+ rows it needs to sort them in
order of the group by columns in order to group rows together. Then it has
to sort all 300,000+ rows again, this time to get them in order by
transaction_date. Then it throws away all but 100 rows and returns the
result

Choosing correct indexes is extremely important when dealing with tables
with large numbers of rows. Unfortunately, deciding on the correct indexes
is much more of an art than a science.

One of the most important thing to get right is the clustered index. But
you only get one clustered index per table. And there are several
charactistics that you want in your clustered index. And, for a given
table, it may be impossible to get optimal conditions for all of these
conditions at the same time. Mostly, not necessarily in any particular
order except that first one is almost always the most important, you want:

First, clustered indexes are wonderful for range queries (give me all log
records between the date and that date - clustered index on logdate, give me
all orders for a given customer - clustered index on customer_id, give me
all transactions for a given account - clustered index on account_id. etc). So you want to pick a clustered index that is useful for as many range queries as possible.

Second, the clustered index key(s) should be as small as possible, since all
of these keys will be included in every nonclustered index on the table.

Third, the clustered index should be unique. If it is not a 4 byte
"duplicate resolver" may added to entries in both the clustered index and
included in every nonclustered index on the table.

Fourth, the clustered index key(s) should not change very often, because
when they do, the row in the table will almost certainly have to be
physically moved, and you will need to update the entry for that row in
every nonclustered index in the table.

Fifth, it is a good thing if all (or at least most) of the new rows gets added
to the table at the end of the clustered index. This prevents page splits

Your clustered index key (transaction_id) nicely does numbers 2, 3, 4, and
5. Unfortunately, at least for this query, it does not handle number 1.
And you probably will almost never have queries of the form - give me all
transactions with transaction_id between 324,576 and 327,801. So this
clustered index will not help any of your range queries and that's not good.
So based on this query, I would probably go with a NONclustered primary key
of transaction_id and a unique clustered index on (account_id, transaction_date, transaction_id).

Of course, when choosing your clustered index, you must also take into
account all the other queries you will be running against this table in
order to pick the one overall best clustered index.

If the clustered index you end up choosing is not helpful for this query,
then you probably need a nonclustered index. The best possible index for a
given query is a covering index(it's usually even better than a clustered
index). That is an index where every column used in the query is also
either a key or an included column in the index. In this case it might look
something like:

Create Unique NonClustered Index <index name> On dbo.Transaction
(account_id, transaction_date, transaction_id) Include (transaction_type_id,
payee_id)

That should make your query run very fast. However, maintaining this index
will slow down the system when rows are added/deleted/updated to the
transaction table (BTW, I would not call this table "transaction".
Transaction is a reserved word in SQL.) Also, the size of this index will
be a substantial fraction of the size of the whole table. So as you can see,
you want to be careful how many nonclustered indexes you add.

But this query will never run well without an appropriate index, and I
assume you believe this query will be run at least moderately often. So,
whether a clustered or nonclustered index, you need to have an index that
will make this query efficient. So I would add such an index and try this
query again.

Hopefully, the query will then run quickly. But maybe it won't. The
problem may be that the Query Optimizer isn't smart enough. The optimizer
does a great job, but sometimes it picks a really bad plan. If it still
runs slowly, try creating a temporary table or table variable, get the top
100 transactions and put them in the temp table or variable, then use that
in your actual query. Something like:

ALTER Proc [dbo].[usp_transactions]
@account_id INT
AS
CREATE TABLE #SelectedTransactions (
account_id int NOT NULL,
transaction_date datetime NOT NULL,
payee_id int NOT NULL,
transaction_type_id int NOT NULL;

INSERT #SelectedTransactions(transaction_id, account_id, transaction_date,
payee_id, transaction_type_id)
SELECT TOP 100 transaction_id, account_id, transaction_date, payee_id,
transaction_type_id
FROM dbo.Transaction
WHERE account_id = @account_id
ORDER BY transaction_date desc, transaction_id desc;

SELECT
TOP 100 t.transaction_id,
t.transaction_type_id,
t.transaction_date,
p.name as payee,
t.payee_id,
ttype.name as Transaction_Type,
CASE WHEN COUNT(*) > 1 THEN '(Split)' ELSE MAX(sc.name) END AS
SubCategory,
SUM(td.amount) Amount
FROM #SelectedTransactions t
INNER JOIN dbo.Transaction_Detail td ON td.transaction_id =
t.transaction_id
INNER JOIN dbo.Transaction_Type ttype ON ttype.transaction_type_id =
t.transaction_type_id
INNER JOIN dbo.Payee p ON p.payee_id = t.payee_id
INNER JOIN dbo.SubCategory sc ON sc.subcategory_id =
td.subcategory_id
WHERE t.account_id = @account_id
GROUP BY
t.transaction_id,
t.transaction_type_id,
t.transaction_date,
p.name,
t.payee_id,
ttype.name
ORDER BY t.transaction_date desc, t.transaction_id desc

That, in combination with the appropriate index should give you a query that
runs very quickly.

BTW, I also added t.transaction_id desc to the ORDER BY to make sure that if
you have ties on transaction_date, and there is a tie for the 100th row, you
always get a consistant, well defined set of rows instead of sometimes
getting one of ties and maybe sometimes getting a different one of the ties.

Tom

"Craig Lister" <cdotlister@xxxxxxxxx> wrote in message
news:788f0361-a0ee-43c8-adca-36d5bd920217@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Server Melt Down! :)

OK, the indexed view seems to work amazingly. I have pumped in
1,000,000 transactions into the Transaction table, and then 1,100,000
Transaction_Details lines into Transaction_Detail.

Getting the list of accounts (I have 3 Accounts at the moment) works
100%. The screen is up in a second with rather large looking
totals... $915,964,000.00

So that's looking good. The problem is when I select an account to
view the transactions.

CPU hits 100%, and stays there.

I'm using a SP.. No Indexed view or anything. Here's the Sproc:

ALTER Proc [dbo].[usp_transactions]
@account_id INT
AS
SELECT
TOP 100 t.transaction_id,
t.transaction_type_id,
t.transaction_date,
p.name as payee,
t.payee_id,
ttype.name as Transaction_Type,
CASE WHEN COUNT(*) > 1 THEN '(Split)' ELSE MAX(sc.name) END AS
SubCategory,
SUM(td.amount) Amount
FROM dbo.[Transaction] t
INNER JOIN dbo.Transaction_Detail td ON td.transaction_id =
t.transaction_id
INNER JOIN dbo.Transaction_Type ttype ON ttype.transaction_type_id =
t.transaction_type_id
INNER JOIN dbo.Payee p ON p.payee_id = t.payee_id
INNER JOIN dbo.SubCategory sc ON sc.subcategory_id =
td.subcategory_id
WHERE t.account_id = @account_id
GROUP BY
t.transaction_id,
t.transaction_type_id,
t.transaction_date,
p.name,
t.payee_id,
ttype.name
ORDER BY t.transaction_date desc

So, like I asked above, is there a way to have the Indexed View assist
here, and also, allow for the MAX(sc.name)?

Or should I use the same Indexed view, and then just join to the
SubCategories table in an SP?



.



Relevant Pages

  • RE: Display queried records with Null values (null recordcount)
    ... tables related to Assets and Transactions tables in the query, ... since AssetID from tblAssets is a foreign ... The query would return multiple rows per account, ...
    (microsoft.public.access.gettingstarted)
  • RE: Display queried records with Null values (null recordcount)
    ... Querying tblTransactions and tblAccounts joined by outer join on AccountID ... query that performs the first join and then include that query in your SQL ... The query would return multiple rows per account, ... DSUM function would sum the values from a query which joins the transactions ...
    (microsoft.public.access.gettingstarted)
  • RE: Display queried records with Null values (null recordcount)
    ... "...It might work as a single query with an outer join between the assets ... tables related to Assets and Transactions tables in the query, ... The query would return multiple rows per account, ...
    (microsoft.public.access.gettingstarted)
  • RE: Display queried records with Null values (null recordcount)
    ... "Ken Sheridan" wrote: ... tables related to Assets and Transactions tables in the query, ... The query would return multiple rows per account, ...
    (microsoft.public.access.gettingstarted)
  • Re: Indexes and primary keys, from Delaney
    ... > I generally query and/or group on SSN, often on the combination of SSN ... > and account number, ... Whereas with a clustered index on the composite key, ...
    (microsoft.public.sqlserver.programming)

Loading