Re: Snapshot Relational Data into New Tables

From: David Portas (REMOVE_BEFORE_REPLYING_dportas_at_acm.org)
Date: 09/27/04


Date: Mon, 27 Sep 2004 21:27:12 +0100

The problem is more fundamental than "how do I find a key?". Ask yourself
"what information do I need to store about a customer?" If you are only
storing customer names then apparently you don't need to differentiate
between two customers of the same name. If you did you would presumably
capture information to allow you to distinguish betwen them.

Now consider a simplified version of what a naive schema for a Customer
table might look like:

CREATE TABLE Customers (customer_id INTEGER IDENTITY PRIMARY KEY,
customer_name VARCHAR(50) NOT NULL)

Maybe you see something like this in your data:

customer_id customer_name
----------- --------------------
1 George W Bush
2 John Doe
3 John Doe
4 John Doe

But why did you store John Doe multiple times? The IDENTITY key is different
but that isn't telling you any extra information about Mr John Doe. Was it
the same John Doe? Or two of them on different occassions? Or three of them?
If you cared you would add something to identify them. If you're only
interested in the name then the following will do just as well:

CREATE TABLE Customers (customer_id INTEGER IDENTITY PRIMARY KEY /* The
surrogate key */, customer_name VARCHAR(50) NOT NULL UNIQUE /* The natural
key */)

customer_id customer_name
----------- --------------------
1 George W Bush
2 John Doe

Or, if it's important to know how many times John Doe was a customer, add a
Trasaction Count column.

Hopefully it's obvious from this example that the surrogate key should
always be redundant because there should always be another key. Expert
opinions will divide on just when you should use IDENTITY or whether it is
necessary at all. What's for sure is that a table whose *only* key is
IDENTITY lacks any data integrity and that has dire consequences when you
try to make sense of the data in queries.

> Are there situations where natural keys are impossible

The only situation I can think of is stuff like server and application logs.
These are event driven and it's often hard to enforce the data integrity in
real time. I view this as an ETL problem rather than a data modelling one
but sometimes it can be difficult to avoid if you use a table as an error
log for example.

-- 
David Portas
SQL Server MVP
--


Relevant Pages

  • Re: Snapshot Relational Data into New Tables
    ... > Now consider a simplified version of what a naive schema for a Customer ... > but that isn't telling you any extra information about Mr John Doe. ... > The only situation I can think of is stuff like server and application logs. ... > SQL Server MVP ...
    (microsoft.public.sqlserver.programming)
  • Re: Combining Values
    ... The customer requires it. ... Commerce Server requires it. ... John Doe | Address 2 ... Do this in the reporting layer if at ...
    (comp.databases.ms-sqlserver)
  • Re: Combining Values
    ... The customer requires it. ... Commerce Server requires it. ... the client wants. ... John Doe | Address 2 ...
    (comp.databases.ms-sqlserver)
  • SubReport - Lined Up with Main Report Detail - Repeats any Multivalues??!?!?!
    ... In the main report, I put the customer name and address, and the Order ... John Doe 2/2/2007 ... It lists the three ...
    (microsoft.public.access.reports)
  • Re: report on customer rage
    ... >> verbatim FYI for those with customer service nightmares (and I ... >> As shoppers head into the holiday shopping season, ... >> A shopper at a Michael's craft store is behind a man doing a price ... who rebuilt yur kombi for ya? ...
    (alt.fashion)

Quantcast