Re: Are distributed partitioned views supposed to improve performance?

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance

From: Ian Boyd (admin_at_SWIFTPA.NET)
Date: 01/22/05


Date: Fri, 21 Jan 2005 19:21:32 -0500

The original two goals of partitioning were to:
    1. Improve performance by reducing data
    2. Decrease backup size

The database contains financial transactions. Each cashier for a given day
has their "Session." For a hundred cashiers, each performing hundreds of
transactions a day, each transaction having details - the database was
getting larger than they would like for timely backups. Especially since all
the minute transaction details are pretty much guaranteed to be useless
after a month - never mind a few years.

The design was to archive data older than 90-days onto a separate server -
with similar table structure. Partitioned views were then constructed on the
archive machine to allow "seamless" access to all data from the archive
machine. i.e.

Server: Live
CREATE TABLE Sessions (
    User int,
    OpenDate datetime,
    CloseDate datetime,
    Location int
    ...)

Server: Archive
CREATE TABLE Sessions_90 (
    User int,
    OpenDate datetime,
    CloseDate datetime,
    Location int
    ...)

CREATE VIEW Sessions AS
SELECT * FROM Sessions_90
UNION ALL
SELECT * FROM LinkedServerToLive.dbo.Database.Sessions

There is now a nightly job that searches for any Sessions that are older
than 90 days on the "Live" server, and if so, finds the oldest Session and
inside a distributed transaction: copies all the Session's allociated data
from Live to Archive, and deleted it from the Live.

Each Session has some associated tables. e.g.
Sessions
        Logins
        Openers
                OpenerDetails
        Transactions
                TransactionEntries
        Closers
                CloserDetails
        EndOfShifts
                EndOfShiftDetails
        and a few others

The client insisted on having realtime access to live data while browsing
the archive data.

Now imagine an ah-hoc query where the customer wants all Sessions in the
last 3 years for a specific Cashier, or for a specific location.
The query takes intolerably long, when a remote query issused to the Live
server must return 15,000 rows across the network.

So the archive software had to use many frustratingly painful techniques to
keep performance bearable.
i.e. Query the local "archive" server. Use OPENQUERY to perform the same
query against the remote "live" server. Insert both result sets into a
temporary table; where it is sorted and then displayed to the user.

or

When viewing details about something particular to one session: query the
local "archive" server to see if the Session exists in the archive. If so,
query for the requisite details from the local archive server. Otherwise,
issue an OPENQUERY to the remote "live" server, and get the requisite
details.

And a couple of other techniques, each of which are making it more and more
difficult to extend the software with new features - as we have to keep
coming up with new techniques to improve performance.

In the end, the whole point was to remove "obsolete" data off the "Live"
server; both to speed up queries, and reduce backup size.

Now that it's a few years later, and we now have to add new features to the
"Archive" software; i don't want to keep performing these ad-hoc techniques
to keep the software usable. i want other ideas.

"Adam Machanic" <amachanic@hotmail._removetoemail_.com> wrote in message
news:ODhLUN8$EHA.2676@TK2MSFTNGP12.phx.gbl...
> Maybe you should reconsider your partitioning strategy, then? Rather than
> "A is older than 90 days", it might be a better idea to base it on actual
> date ranges -- so that data doesn't have to move as it ages out. Can you
> post more info about your data and typical queries? Why did you choose a
> partitioned strategy?



Relevant Pages

  • Re: bind() udp behavior 2.6.8.1
    ... "Transaction ID" in the DNS porttion of the packet. ... > ip:port and from no other server on the net. ... You new session is totally ...
    (Linux-Kernel)
  • Re: LINQ with multiple SQL databases (data contexts) and transactions
    ... We'll have FULL Linq support ready at the end of the month ... server A, database A1 and data context on server A, database B1, does ... execution of a query is deferred, so it's not said when the query is ... transaction, and inside that transaction you want to fetch data from a ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: pass a $result from a mysql_query in a url
    ... pass information required for the query (NOT the query ... And BTW - if you're passing the query parms, doing it in the $_SESSION ... Passing it in a post variable means the data has to make 2 trips - server to client, ... For these reasons and more, when you're working with a database, it's often better just to fetch the data on the new page. ...
    (comp.lang.php)
  • Re: Is there a BEA Tuxedo equivalence in Java?
    ... If you maintain state in your application server, ... If you have a state associated to the client session, ... How do you rollback a transaction> that ... A web service is just a POJO that are registered as a service. ...
    (comp.object)
  • Re: Deadlock problem on the 2005 Server
    ... I have this transaction what I want to execute on the MSSQL Server ... My assumption is that it happens because the DELETE query has not ... executing the DELETE query on only then proceed executing the other. ...
    (microsoft.public.sqlserver.server)