Re: Anyone out there processing 3 million rows/ hour and 60 million rows/day?

Tech-Archive recommends: Speed Up your PC by fixing your registry



I'll let the community provide you contacts, i.e. other users to talk to,
but I wanted to point out a fiew gotchas.

First, 3M fact rows per hour is pretty doable so long as these are not
dimension updates; just new incremental facts. The general rule of thumb
that I use is that server-class hardware with a quality I/O subsystem (e.g.
maybe a 4X 2GHz system with good RAID -- not necessarily SAN) should be able
to process 1M rows per minute assuming that 1) you've ran the optimize
schema wizard and removed as many of the joins as possible from the SQL
statement 2) that you don't have distinct count measures (which adds
complexity to the SQL statement) and additional processing time (with larger
aggregates), 3) you have tuned your system for a good process buffer size,
and 4) you haven't over-aggregated, i.e. designed thousands of aggregates .
See the AS Performance Guide for best practices in this area:
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ansvcspg.mspx

If doing dimension updates be careful if doing incrementals of a changing
dimension. This will cause all flex aggs to be deleted -- which can cause
significant CPU loading immediately thereafter to recalculate all of the
flex aggs (which is a serial process -- one partition at a time).

Lastly, if you are doing hourly incremental processing, watch for
long-running queries. In order to commit a processing transaction, the
server must set a quiet point (not unlike a checkpoint in the RDBMS). Long
running queries can impact the system because while the query is running the
processing transaction cannot commt. And while the processing transaction is
waiting for an exclusive lock; it blocks new queries from running. Thus you
will find hourly processing very difficult to implement if you have queries
which take 1/2 to run :-) For a normal system with queries of a few seconds
you won't notice the difference.

Hope that helps.
--
Dave Wickert [MSFT]
dwickert@xxxxxxxxxxxxxxxxxxxx
Program Manager
BI SystemsTeam
SQL BI Product Unit (Analysis Services)
--
This posting is provided "AS IS" with no warranties, and confers no rights.


"Jesse O" <jesperzz@xxxxxxxxxxx> wrote in message
news:ue2Vv2AfFHA.580@xxxxxxxxxxxxxxxxxxxxxxx
> Anyone out there processing a large amount of data (million of facts)
> hourly in a 24/7 environment?
>
> I'm embarking on such a project and would like to chat on things you have
> done in order to succeed.
>


.



Relevant Pages

  • Re: Semistructured data
    ... Sampo Syreeni wrote: ... Sets of facts don't have ideas and don't ascribe meanings. ... Those translate into path queries and mutually recursive transitive joins, ...
    (comp.databases.theory)
  • Re: 1955 world series
    ... Facts are stubborn things, but statistics are much more pliable. ... ~ Mark Twain ...
    (rec.sport.baseball)
  • Re: Entering facts from the prompt -- newbie question
    ... It seems on the prolog systems I've looked at, you can only put facts in files and consult the files. ... Do you put ?- in front of the queries? ... Do you specify that the facts should be in a separate file? ...
    (comp.lang.prolog)
  • Entering facts from the prompt -- newbie question
    ... It seems on the prolog systems I've looked at, you can only put facts in files and consult the files. ... If you were typing out a prolog listing for an example to someone else, how do you separate the facts from the queries? ... Do you specify that the facts should be in a separate file? ...
    (comp.lang.prolog)