Re: Design for historical data

From: Louis Davidson (dr_dontspamme_sql_at_hotmail.com)
Date: 10/29/04


Date: Thu, 28 Oct 2004 22:44:09 -0400


> Given the very brief overview, #2 is usually the way to go. If the three
> types of visits are really that, 3 separate types of visits, which store
> unique and different types of data, then they should be 3 separate
> tables. You can do all the formatting for the display on the front end
> when you need to make them all fit together for the UI.

I second this. And we call it a subtype. Parent table to hold the
supertypes columns that are like, and subtype tables to hold the
differences. Use the primary key of the supertype table as the key (same
name) of the subtype tables, and have a discriminator column in the
supertype to tell you which subtype the row is in.

I know it sounds a bit messy (and it is to start with) but it is a lot
cleaner to keep the data correct, and to use. It does take more display
code, but less of the far more troublesome data validation code.

> As to "how many rows are too many"...well, that's really more based on
> your hardware. As an example, I'm currently running a database that has
> about 10 million rows of data in it that gets used daily (data only
> added/updated once a week though) and it runs fine on some crappy server
> that was just laying around.

And 10 million rows is small in comparison to some systems. We hit the 2
billion rows of page delivery history in a system I helped build (I know
because the people who were left behind told me that they had to change to
bigints! Good design is important, decent, well balanced hardware (disk,
cpu, memory, etc all balanced, not slow disk and fast cpu, etc)

> More times than not, if there is a bottleneck while retrieving data, it
> is going to be poor table design and poor indexing. Hardware can be an
> issue (especially memory and disk acccess) but don't go spend 10k
> upgrading your server until you've optimized your queries.

Amen. Just adding faster hardware can get you almost linear increases in
performance (add another processor and you can almost process twice as much)
but adding an index to a query can give you exponential increases in
performance of queries. Bulk up hardware only once you have optimized
everything, and you are dealing with blocks (hardware or software) that just
cannot be beaten. Otherwise you are just wasting precious bonus money :)

-- 
----------------------------------------------------------------------------
Louis Davidson - drsql@hotmail.com
SQL Server MVP
Compass Technology Management - www.compass.net
Pro SQL Server 2000 Database Design -
http://www.apress.com/book/bookDisplay.html?bID=266
Note: Please reply to the newsgroups only unless you are interested in
consulting services.  All other replies may be ignored :)
"Zach Wells" <none@by.com> wrote in message
news:uIBIDuVvEHA.908@TK2MSFTNGP11.phx.gbl...
> Given the very brief overview, #2 is usually the way to go. If the three
> types of visits are really that, 3 separate types of visits, which store
> unique and different types of data, then they should be 3 separate
> tables. You can do all the formatting for the display on the front end
> when you need to make them all fit together for the UI.
>
> As to "how many rows are too many"...well, that's really more based on
> your hardware. As an example, I'm currently running a database that has
> about 10 million rows of data in it that gets used daily (data only
> added/updated once a week though) and it runs fine on some crappy server
> that was just laying around.
>
> More times than not, if there is a bottleneck while retrieving data, it
> is going to be poor table design and poor indexing. Hardware can be an
> issue (especially memory and disk acccess) but don't go spend 10k
> upgrading your server until you've optimized your queries.
>
> Zach
>
> ER Slansky wrote:
> > Because my only real design experience is with Access, I don't know how
> > powerful SQL Server (2k) can be and the best way to design for it.
> >
> > Moving from Access because table of historical data is simply getting
> > too large.  I will have to keep history on an individual potentially for
> > years (for doctors office) because the same doctor may not be the one to
> > see an individual each time they come in.  All prior history must be
> > called up to review when making a diagnosis.
> >
> > There are also three types of visits with really nothing in common
> > except the patient, but the history of all three types of visits at
> > times must be displayed and other times only the history of a specific
> > type of visit displayed.
> >
> > So, and I apologize if this is a trivial question for experienced
> > designers, I see two options:
> >
> > 1) Combine everything in one table with indexes on visit type and
> > patient - this would result in largest number of records in one table
> > with the fields that are dissimilar containing nulls.
> >
> > 2) Three separate tables for each type of visit containing only the
> > specific fields for that visit type and index off patient.  To get a
> > complete history, I would have to search all three tables for the
> > patient, but when only needing history on one visit type, I would only
> > have to go to that particular table.
> >
> > In Access, the design is currently one large table and to search, even
> > with an index on patient, is taking longer as the number of records
> > grows.  I know SQL Server can hangle it much better, but would it handle
> > either option equally well? There are quite a few adds, and
> > modifications to the "current" visit record for maybe 2 or 3 days after
> > a visit, but really hardly ever any deletes.
> >
> > How many records is "too many" for SQL - or is there such a thing??
> >
> > Thanks,
> > ER Slansky
> > San Antonio, TX
> >
> >
> > *** Sent via Developersdex http://www.developersdex.com ***
> > Don't just participate in USENET...get rewarded for it!


Relevant Pages

  • Design for historical data
    ... Because my only real design experience is with Access, ... powerful SQL Server can be and the best way to design for it. ... All prior history must be ... patient - this would result in largest number of records in one table ...
    (microsoft.public.sqlserver.programming)
  • Re: Design for historical data
    ... types of visits are really that, 3 separate types of visits, which store ... is going to be poor table design and poor indexing. ... All prior history must be ... > patient - this would result in largest number of records in one table ...
    (microsoft.public.sqlserver.programming)
  • Re: Optimizing the Speed / Design tradeoff in numeric applications
    ... >> calculations to arrive at three values, named d2, t0 and theta. ... > I agree with Pete that whenever possible, design should trump ... My attitude would be to first separate the external world ... typedef struct gazouta { ...
    (comp.programming)
  • RE: "Module" option does not show up when setting permissions.
    ... It's by design. ... Access 2000 introduced the separate VBA ... Project for handling the code (notice the separate code window (VB Editor) ... an MDE database file out of it if you want to hide the source code from ...
    (microsoft.public.access.security)
  • Re: Forest, Domain, OU design question
    ... It seemed that the exam was more focused on GPO, NT 4 DNS situations and RIS then any real situations regarding AD design. ... >> they wanted separate schemas or keep administration separate. ... > Domains are REQUIRED for "diffferent security> ACCOUNT policies" > although sloppiness is possible on any particular> question. ... > Domains may be required/desired if you need "complete> control" of resources, mirroring NT domain structures>, and for either/both "massive> number of objects" and to "control replication" in WANS. ...
    (microsoft.public.win2000.active_directory)