Re: Insert with response

From: Joe Celko (jcelko212_at_earthlink.net)
Date: 11/06/04


Date: Sat, 06 Nov 2004 14:20:16 -0800


>> An IDENTITY can be great as an additional, alternate key[sic].
Suppose you have a table whose natural key is composed of four columns,
. <<

Why did you call it a key when **by definition**, it is not? Do you
like the accurate term "exposed non-relational physical locator"
instead? :)

Actually, IDENTITY is a horrible choice, There is no way to verify it,
to port it or reference it safely. Since there is no way to relate
these two "keys", which of these two tables is correct? That is, which
one has BOTH keys referencing the proper entity, as it exists in the
real world? How did you put them in synch? if you cannot do that, you
have no data integrity.

CREATE TABLE FooBar
(id IDENTITY NOT NULL PRIMARY KEY,
 real_key CHAR(5) NOT NULL UNIQUE,
 ...;

CREATE TABLE BarFoo
(...
 foo_key INTEGER NOT NULL
     REFERENCES FooBar(id),
 bar_key CHAR(5) NOT NULL
     REFERENCES FooBar(real_key),
 PRIMARY KEY (bar_key, foo_key),
 ..);

Foobar
=======
1 'a'
2 'b'
3 'c'

BarFoo
=======
1 'c'
2 'a'
3 'b'

>> Now, the other tables that have to refer to this table can use just
one integer column (referring to the identity column) instead of four
long character columns. The space for the original table will increase,
but the space for the other tables will decrease and joins will be
carried out faster - the only downside is that you'll need to use a join
more often, but in this particular case the pros will outweigh the cons.
<<

Have you actually measured the performance differences in an INTEGER
versus CHAR(n) columns in joins? The usual answer is that nobody has
and the myth continues among lazy, sequential file programmer who do not
want to learn RDBMS.

Given hard disk access time, the slowest operation in the hardware by
microseconds versus nanoseconds, what does the extra column do for table
scans, the most common database operation?

One of the other problems is that IDENTITY is *much* more subject to bad
data entry. If I have a valid IDENTITY of 56743, then I probably also
have a valid IDENTITY of 56734 in my data. I'd have to look it up
again, but the error rate in typing an integer of 5 digits long is <2%
with a skilled typist.

--CELKO--
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are. Sample data is also a good idea, along with clear
specifications.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!