Efficient coordinated queries??
- From: "Tom" <tbrown92030@xxxxxxxxx>
- Date: Sun, 5 Apr 2009 21:12:23 -0700
Bear with me, here is some background...
I need to process a lot of data that's being brought in from SQL. A large chunk of it can be brought in with a single query (albeit joining to many tables) and each row represents an item to process. However, there is a significant amount of data that exists in a 0..n relationship to this initial row of data, basically five different sets of data that have a 0..n relationship. For a hypothetical example consider a customer order database where we need to process through all customers and can bring in a row of customer data as our main query but there are 0..n1 rows of customer order data, 0..n2 rows of customer feedback information, 0..n3 rows of customer discounts available, etc.
There is way too much data to bring in at once, into a one big DataSet, and process. Instead I want to bring in the data as individual DataSets representing an item to process. That is, table 0 contains the one row from the main query and table 1 through table 5 have their respective 0..n rows that correspond to the data in table 0.
So, what is the best way to get these individual DataSets built?
1. Establish a SqlDataReader on the first query and, for each row obtained, separately query for the other five sets of data to get their 0..n rows.
2. Establish a SqlDataReader on each query ensuring that the five correlated queries are ordered the same as the first for its primary key (this key is common across all these sources, of course) and examine the primary key/foreign key values for the correlated readers to merge the appropriate rows into a DataSet once the primary row has been obtained.
3. Fully denormalize in a gi-normous SQL query and go back to just a single row of data per item.
4. ???
I've only just started thinking about (3) but would like to avoid this since the n values of some of the 0..n subqueries could be significant (10s, 20s of correlated rows, but enough to make denormalization unpalatable). I've not tried (1) thinking that the act of querying so many times would kill performance (there are potentially millions of items to process) and have been going with (2). However this is still proving to be a bottleneck in processing. I'm hoping there is another way to go that avoids having to issue separate subqueries for each row returned on the primary query but moves more of the burden of merging the correlated data on the SQL server instead of my process.
Any and all feedback and suggestions will be most welcome!
-- Tom
.
- Follow-Ups:
- Re: Efficient coordinated queries??
- From: G.S.
- Re: Efficient coordinated queries??
- From: Miha Markic
- Re: Efficient coordinated queries??
- Prev by Date: Include tables from multiple SQL DBs in an ADO.NET Entity Data Mod
- Next by Date: Re: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
- Previous by thread: Include tables from multiple SQL DBs in an ADO.NET Entity Data Mod
- Next by thread: Re: Efficient coordinated queries??
- Index(es):
Relevant Pages
|
Loading