Re: why use 4nf?

From: Louis Davidson (dr_dontspamme_sql_at_hotmail.com)
Date: 05/21/04


Date: Thu, 20 May 2004 23:50:39 -0500

Ah, and now the rebuttal from the Identity Lover's party.

> Experienced database designers tend toward intelligent keys they find in
> industry standard codes, such as UPC, VIN, GTIN, ISBN, etc. They know
> that they need to verify the data against the reality they are modeling.
> A trusted external source is a good thing to have.

This is an absolute truth. If you don't include intelligent industry
standard codes in your table, then you are on the road to failure (which to
be quite honest, most of the audience of this rant is directed at.) As
indicated by my affiliation with the IL party, I personally use identities
values for every table. A system I am about to start working with actually
uses GUIDs for their primary keys. This is fine, but it requires 2x or 4x
the number of index pages to store. Like the rant says, this is not
horrible, but it is something to understand.

In any event if you use an "artificial" key as your primary key, it is
essential to apply at least one other UNIQUE key to the table to ensure
uniqueness. Make sure also that the artificial key is not a part of this
key, or you haven't gained anything.

> But let's look at the logical problems. First try to create a table
> with two columns and try to make them both IDENTITY. If you cannot
> declare more than one column to be of a certain data type, then that
> thing is not a datatype at all, by definition. It is a property which
> belongs to the PHYSICAL table, not the LOGICAL data in the table.

You had already said that IDENTITY is a property (really it is more or less
an internal program that generates sequential values with a high scale of
concurrency, with little programming, or in other words, it replaces some
code that the user could write themselves.) and since the datatype is int,
this is not a real issue.

> There are (n!) ways to number (n) rows, so which one do you pick? The
> answer has been to use whatever the *physical* order of the result set
> happened to be. That non-relational phrase "physical order" again!

It just does the exact same thing that we would when we build sequential
keys in our code. Some row has to be first, and some row has to be last.
The rows will be in the same order as we would get in the inserted table.
How would be the best way to do this, say in the case where you are trying
to create three rows at one time, and you need to generate new keys? Like
if you were inserting the rows for the new VIN numbers you were generating
for a batch of vehicles (the last numbers indicate the build order.) Do you
have to do them one at a time? How should this:

> INSERT INTO Foobar (a, b, c)
> SELECT x, y, z
> FROM Floob;

work, if you are not getting the key from the Floob table (which you really
shouldn't be doing, since this would mean that you probably have
normalization issues with the Floob.)

> Using IDENTITY as a primary key is a sign that there is no data model,
> only an imitation of a sequential file system. Since this "magic,
> all-purpose, one-size-fits-all" pseudo-identifier exists only as a
> result of the physical state of a particular piece of hardware at a
> particular time as read by the current release of a particular database
> product, how do you verify that an entity has such a number in the
> reality you are modeling?

> Now change the natural key in Personnel:
>
> UPDATE Personnel
> SET ssn = '666666666'
> WHERE ssn = '000000000';
>
> Nothing happened in Motorpool, did it? You can do the same thing with a
> VIN.

AAAHH. Now you are really confusing me. Why on earth would you do
something like this. If you store the SSN in one and only one place and
refer to it by the primary key of the table where it is stored, you never
have problems like this. Why would
you have the SSN in the motor pool. You would have the personelId.

I don't understand your schema either.

> CREATE TABLE MotorPool
> (id IDENTITY (1,1) NOT NULL PRIMARY KEY,
> ssn CHAR(9) NOT NULL REFERENCES Personnel(id),
> vin CHAR(17) NOT NULL REFERENCES Vehicle(id));
>
> CREATE TABLE Personnel
> (id IDENTITY (1,1) NOT NULL PRIMARY KEY,
> ssn CHAR(9) NOT NULL UNIQUE,
> ..);
>
> CREATE TABLE Vehicles
> (id IDENTITY (1,1) NOT NULL PRIMARY KEY,
> vin CHAR(17) NOT NULL UNIQUE,
> .);

You have a CHAR(9) column referencing an integer column? The way this makes
sense is:

CREATE TABLE MotorPool
(motorPoolId IDENTITY (1,1) NOT NULL PRIMARY KEY,
 personelId int NOT NULL REFERENCES Personnel(personelId),
 vehicleId int NOT NULL REFERENCES Vehicle(vehicleId)),
  unique (personelId, vehicleId)

CREATE TABLE Personnel
(personelId IDENTITY (1,1) NOT NULL PRIMARY KEY,
 ssn CHAR(9) NOT NULL UNIQUE,
 ..);

CREATE TABLE Vehicle
(vehicleId IDENTITY (1,1) NOT NULL PRIMARY KEY,
 vin CHAR(17) NOT NULL UNIQUE,
.);

Now there is only one place to change the social security number. If you
want to change the vehicle for a person, change the personelId. This type
of design requires a lot more joining to get to the pertinent data, but all
values are editable in one and only one place. There is never a need to use
cascading constraints.

>From your other post:

>Why would any SQL programmer think in terms of pointers? You are saying
>that you don't know how to use the language properly.

Because primary keys as well as unique keys perform a very similar role as
pointers do. With the value of a key, you can send a request to the
database server and say "Hey, give me the row in the Blurg table with the
key of <some value>" This is where the similarities end, but this is a
very important similarity. We use the primary key to access individual rows
all of the time. In OLTP systems, a very large percentage of queries deal
with rows one at a time (changes to one rows are usually one at a time,
especially if the system is normalized, because the goal is to store one
value in one place. If we are updating multiple rows with the same value, we
may have a problem.) Singleton modifications are done best with a primary
key, regardless of whether it is a artificial, surrogate, natural, compound
(or whatever) key.

This of course is where the similarities end of course, because SQL is
supposed to insulate us from any physical access. So no matter how the data
is stored, we simple ask SQL for the row or rows. The other incredibly
important distinction is that the pointer is not the only method of
accessing a row logically. SQL is incredibly flexible and doesn't restrict
us to simple pointer access like object oriented systems do (as you
obviously know :).

This was from Mike's reply

>> I may have an Event, where no two events have the same (StartTime,
Venue, RoomNo), but it certainly is easier in code to keep a reference
to an EventID rather than a 3-tuple to identify the event. <<

>If the event has a natural key, like "Johnson Wedding" then you are
>fine. But you still have to carry the 3 column key, too.

This is exactly the point. Just adding a new value that we assign to a row
does not destroy the integrity. And it really makes coding accessing
objects easier, faster, and more reliable. Since all of my tables have a
single integer key, all usage of these objects need only deal with a single
integer as the key. Nothing precludes accessing the data in any other
method, but it is easier this way.

-- 
----------------------------------------------------------------------------
Louis Davidson (drsql@hotmail.com)
Compass Technology Management
Pro SQL Server 2000 Database Design
http://www.apress.com/book/bookDisplay.html?bID=266
Note: Please reply to the newsgroups only unless you are
interested in consulting services.  All other replies will be ignored :)
"Joe Celko" <jcelko212@earthlink.net> wrote in message
news:O%23zyHurPEHA.3220@TK2MSFTNGP09.phx.gbl...