Re: DB Architecture Questions (for joe celko)



Actually no, the guy isn't. He asked some straight forward questions to which you can reply or not. <<

No, he asked very general question s that deal with RDBMS foundations.
I tried to touch on the major concepts.

[ rows are nothing like records] Oh yes they are! <<

So a deck of punch cards or a mag tape is just like an SQL Schema to
you? Let me do a badly organized "cut & paste"

Like most new ideas, the hard part of understanding what the
relational model is comes in un-learning what you know about file
systems. As Artemus Ward (William Graham Sumner, 1840-1910) put it,
"It ain't so much the things we don't know that get us into trouble.
It's the things we know that just ain't so." Dijkstra also said the
same thing about programming.

If you already have a background in data processing with traditional
file systems, the first things to un-learn are:

(0) Databases are not file sets. Files do not have relationships
among themselves; everything is done in applications. SQL does not
mention anything about the physical storage in the Standard, but files
are based on physically contiguous storage. This started with punch
cards, was mimicked in magnetic tapes, and then on early disk
drives.

(1) Tables are not files; they are parts of a schema. The schema is
the unit of work. I
cannot have tables with the same name in the same schema. A file
system assigns a name to a file when it is mounted on a physical
drive; a table has a name in the database. A file has a physical
existence, but a table can be virtual (VIEW, CTE, query result,
etc).

(2) Rows are not records. Records get meaning from the application
reading them. Records are sequential, so "first", "last", "next" and
"prior" make sense; rows have no physical ordering (ORDER BY is a
clause in a CURSOR). Records have a physical locator, such as
pointers and record numbers. Rows have keys, which are based on
uniqueness of a subset of attributes in a data model. The mechanism
is not specified and it varies quite bit from SQL to SQL.

(3) Columns are not fields. Fields get meaning from the application
reading them -- and may have several meanings depending on the apps.
Fields are sequential within a record and do not have data types,
constraints or defaults. This is active versus passive data! Columns
are also NULL-able, a concept that does not exist in fields. Fields
have to have physical existence, but columns can be computed or
virtual. If you want to have a computed column value, you do in the
application, not the file.

Another conceptual difference is that a file is usually data that
deals with a whole business process. A file has to have enough data
in itself to support applications for that business process. Files
tend to be "mixed" data which can be described by the name of the
business process, such as "The Payroll file" or something like that.
Tables can be either entities or relationships within a business
process. This means that the data which was held in one file is often
put into several tables. Tables tend to be "pure" data which can be
described by single words. The payroll would now have separate tables
for timecards, employees, projects and so forth.

Tables as Entities

An entity is physical or conceptual "thing" which has meaning be
itself. A person, a sale or a product would be an example. In a
relational database, an entity is defined by its
attributes, which are shown as values in columns in rows in a table.

To remind users that tables are sets of entities, I like to use
collective or plural nouns
that describe the function of the entities within the system for the
names of tables. Thus
"Employee" is a bad name because it is singular; "Employees" is a
better name because it is plural; "Personnel" is best because it is
collective and does not summon up a mental picture of individual
persons.

If you have tables with exactly the same structure, then they are sets
of the same kind of elements. But you should have only one set for
each kind of data element! Files, on the other hand, were PHYSICALLY
separate units of storage which could be alike -- each tape or disk
file represents a step in the PROCEDURE, such as moving from raw data,
to edited data, and finally to archived data. In SQL, this should be
a status flag in a table.

Tables as Relationships

A relationship is shown in a table by columns which reference one or
more entity tables.

Without the entities, the relationship has no meaning, but the
relationship can have attributes of its own. For example, a show
business contract might have an agent, an employer and a talent. The
method of payment is an attribute of the contract itself, and not of
any of the three parties. This means that a column can have a
REFERENCES to other tables. Files and fields do not do that.

Rows versus Records

Rows are not records. A record is defined in the application program
which reads it; a row is defined in the database schema and not by a
program at all. The name of the field in the READ or INPUT statements
of the application; a row is named in the database schema. Likewise,
the PHYSICAL order of the field names in the READ statement is vital
(READ a,b,c is not the same as READ c, a, b; but SELECT a,b,c is the
same data as SELECT c, a, b.

All empty files look alike; they are a directory entry in the
operating system with a name and a length of zero bytes of storage.
Empty tables still have columns, constraints, security privileges and
other structures, even tho they have no rows.

This is in keeping with the set theoretical model, in which the empty
set is a perfectly good set. The difference between SQL's set model
and standard mathematical set theory is that set theory has only one
empty set, but in SQL each table has a different structure, so they
cannot be used in places where non-empty versions of themselves could
not be used.

Another characteristic of rows in a table is that they are all alike
in structure and they are

all the "same kind of thing" in the model. In a file system, records
can vary in size,
data types and structure by having flags in the data stream that tell
the program reading the data how to interpret it. The most common
examples are Pascal's variant record, C's struct syntax and Cobol's
OCCURS clause.

The OCCURS keyword in Cobol and the Variant records in Pascal have a
number which tells the program how many time a sub-record structure is
to be repeated in the current record.

Unions in 'C' are not variant records, but variant mappings for the
same physical memory. For example:

union x {int ival; char j[4];} myStuff;

defines myStuff to be either an integer (which are 4 bytes on most
modern C compilers, but this code is non-portable) or an array of 4
bytes, depending on whether you say myStuff.ival or myStuff.j[0];

But even more than that, files often contained records which were
summaries of subsets of the other records -- so called control break
reports. There is no requirement that the records in a file be
related in any way -- they are literally a stream of binary data whose
meaning is assigned by the program reading them.

Columns versus Fields

A field within a record is defined by the application program that
reads it. A column in a row in a table is defined by the database
schema. The datatypes in a column are always scalar.

The order of the application program variables in the READ or INPUT
statements is important because the values are read into the program
variables in that order. In SQL, columns are referenced only by their
names. Yes, there are shorthands like the SELECT * clause and INSERT
INTO <table name> statements which expand into a list of column names
in the physical order in which the column names appear within their
table declaration, but these are shorthands which resolve to named
lists.

The use of NULLs in SQL is also unique to the language. Fields do not
support a missing data marker as part of the field, record or file
itself. Nor do fields have constraints which can be added to them in
the record, like the DEFAULT and CHECK() clauses in SQL.

Relationships among tables within a database

Files are pretty passive creatures and will take whatever an
application program throws at them without much objection. Files are
also independent of each other simply because they are connected to
one application program at a time and therefore have no idea what
other files looks like.

A database actively seeks to maintain the correctness of all its
data. The methods used are triggers, constraints and declarative
referential integrity.

Declarative referential integrity (DRI) says, in effect, that data in
one table has a

particular relationship with data in a second (possibly the same)
table. It is also possible to have the database change itself via
referential actions associated with the DRI.

For example, a business rule might be that we do not sell products
which are not in inventory.

This rule would be enforce by a REFERENCES clause on the Orders table
which references the Inventory table and a referential action of ON
DELETE CASCADE Triggers are a more general way of doing much the same
thing as DRI. A trigger is a block of procedural code which is
executed before, after or instead of an INSERT INTO or UPDATE
statement. You can do anything with a trigger that you can do with
DRI and more.

However, there are problems with TRIGGERs. While there is a standard
syntax for them in the SQL-92 standard, most vendors have not
implemented it. What they have is very proprietary syntax instead.
Secondly, a trigger cannot pass information to the optimizer like
DRI. In the example in this section, I know that for every product
number in the Orders table, I have that same product number in the
Inventory table. The optimizer can use that information in setting up
EXISTS() predicates and JOINs in the queries. There is no reasonable
way to parse procedural trigger code to determine this relationship.

The CREATE ASSERTION statement in SQL-92 will allow the database to
enforce conditions on the entire database as a whole. An ASSERTION is
not like a CHECK() clause, but the difference is subtle. A CHECK()
clause is executed when there are rows in the table to which it is
attached.

If the table is empty then all CHECK() clauses are effectively TRUE.
Thus, if we wanted to be sure that the Inventory table is never empty,
and we wrote:

CREATE TABLE Inventory
( ...
CONSTRAINT inventory_not_empty
CHECK ((SELECT COUNT(*) FROM Inventory) > 0), ... );

it would not work. However, we could write:

CREATE ASSERTION Inventory_not_empty
CHECK ((SELECT COUNT(*) FROM Inventory) > 0);

and we would get the desired results. The assertion is checked at the
schema level and not at the table level.

.



Relevant Pages

  • Re: Syntax to add records if primary key = list
    ... SQL and RDBMS. ... Modern data processing began with punch cards, ... The users might not all have the same database access rights ... This rule would be enforce by a REFERENCES ...
    (comp.databases.ms-sqlserver)
  • Re: Newbie question on table design.
    ... The early SQL systems were also ... built on a queues and locking model because that is what we had. ... a row is defined in the database ... REFERENCES clause on the Orders table which references the Inventory ...
    (comp.databases.theory)
  • Re: Tricky Error
    ... someone could go ahead and inject their own SQL to create their own custom WHERE clause. ... If the cache is going to be big, I make this a seperate script and schedule it to be called daily, etc. ... it also sped up the application in the process as getting the contents of a specific file is quicker than connecting to a database or making a remote connection to fetch content. ...
    (comp.lang.php)
  • SQL statement for left outer join for ranges in the same file
    ... I created the sql statement using the MS-Query tool however ... external database? ... and probable other References to pass syntax check for sqlopen etc. ...
    (microsoft.public.excel.programming)
  • Re: ORDER BY in VIEW not working
    ... one huge database and lots of developers, it makes sense to focus on ... focus on developer productivity instead. ... The fact of the matter is that SQL server used to work the way most ... an ORDER BY clause returned ordered results. ...
    (comp.databases.ms-sqlserver)