Re: Suggestion for Handling Very Large Tables (SQL 2000)
- From: "Jéjé" <willgart@xxxxxxxxxxxxxxxxx>
- Date: Sat, 25 Feb 2006 09:55:51 -0500
my first suggestion is:
disk subsystem!
increase the number of disks to increase the throughput. and setup a Raid
0+1.
scanning big tables with or without indexes is (near) allways slowdown by
the disks.
1 good IDE drive can read 60MB/s, if your table contain 5Gb of data... then
reading the table is long.
if you have 4 disks in Raid 0 (stripped) you can raise this value to 60 * 4
= 240MB/s.
its not so simple, because index and file fragmentation reduce this
throughput, but its a starting point.
for example, I have a database with a partitionned table which contains 110
millions of rows.
Doing a query like:
select D.column1, count(*) from table T inner join dimension D on D.key =
T.Key group by D.column1
(the key used is the partitionned key)
takes 8minutes on a dev server (1 dual core opteron, 3 IDE drives in raid 0)
During this process the CPU is used at 25%
The same query is slower (10 minutes) on the production server (4 Xeon, SAN)
because we have a SAN issue.
so you see the impact of the disk subsystem.
also, if you plan to do some group by queries, the tempdb database will be
used.
so if the tempdb database use his own set of disks, you'll improve the
performance.
from a developper point of view, maybe you have some control of the queries
executed against the database. So if you plan to aggregate data, maybe your
application (or your storedprocedures) can update the big table and the
aggregated table at the same time, then your "reports" will execute queries
against the summarized table.
but if you can't control what the user ask for, try using AS, specially
AS2005 which provide a lot of options for real time cubes. A webcast
demonstrate how to update a MOLAP cube based on a large table (more then
100millions of rows) in near real time.
"Christian Nunciato" <ChristianNunciato@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
message news:5BF9AF00-3A74-4044-A203-2E5CAA48D18D@xxxxxxxxxxxxxxxx
Hi all:
I just learned today that a project I'm working on has a couple of very
large tables that need, somehow, to be optimized for searching. One table
has roughly 7 million rows, and another, its relative, has over 16
million.
At present, client applications only insert into these tables, but our
users
have expressed a desire to select from them as well, using a small set of
search criteria, so we need to come up with an appropriate approach. At
the
very least, to continue the application as it exists today, it seems to me
the two tables need to be trimmed down -- perhaps have their data archived
into separate tables, or databases, or some other approach -- but if users
also want to begin reporting on this data, allowing them to select against
it
could bring the server, which is already overburdened, to its knees.
I'm an application developer and not a DBA, so I'm looking to you all for
some best-practices suggestions, here. I've looked briefly into SQL
Analysis
Services, but I'm not sure thatt's the right approach for us, since it
appears Analysis Services is merely an interface for handling data
processing, and our users need real-time data (or as close to real-time as
possible -- big surprise, eh?). I'm wondering if simply segmenting the
data
(which is very simply stored -- user ID, thing ID, date/time stamp, and
that's pretty much it) into separate tables might be an acceptable
approach,
but that approach also seems a little crude to me. How do DBAs handle
very
large t ables like this in a way that's least cumbersome to the
client-side
developers and end users?
Thanks hugely in advance,
Chris
.
- Prev by Date: Re: CRM - Warehouse integration
- Next by Date: Adding Sum column to cube
- Previous by thread: Re: Suggestion for Handling Very Large Tables (SQL 2000)
- Next by thread: Adding Sum column to cube
- Index(es):
Relevant Pages
|