Re: Problems with querying date field

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Joe Celko (joe.celko_at_northface.edu)
Date: 03/29/04


Date: Mon, 29 Mar 2004 15:49:48 -0800


>> I'm confused by this. I always use records, fields, and tables <<

There are important differences between a file system and a database, a
table and file, a row and record, and column and field. If you do not
have a good conceptual model, you hit a ceiling and cannot get past a
certain level of competency.

In 25 words or less it is "Logical versus Physical", but it goes beyond
that. A file system is a loose collection of files, which have a lot of
redundant data in them. A database system is a **single unit** which
models the **entire** enterprise as tables, constraints, etc.

Files are independent of each other, while tables in a database are
interrelated. You open an entire database, not single tables within it;
but you do open individual files. An action on one file cannot effect
another file unless they are in the same application program; tables can
interact without your knowledge via DRI actions, triggers, etc.

The original idea of a database was to collect data in a way that
avoided redundant data and had no dependency on a particular programming
language or application. A general, common data repository.

A file is made up of records, and records are made up of fields. A file
is ordered and can be accessed by a PHYSICAL location, while a table is
not. Saying "first record", "last record", and "next n records" makes
sense in a file, but there is no concept of a "first row", "last row",
and "next row" in a table.

A file is usually associated with a particular language -- ever try to
read a Fortran file with a Cobol program? A database is language
independent; the internal SQL datatypes are converted into host language
datatypes.

A field exists only because of the program reading it; a column exists
because it is in a table in a database. A column is independent of any
host language application program that might use it.

In a procedural language, "READ a, b, c FROM FileX;" does not give the
same results as "READ b, c, a FROM FileX;" and you can even write "READ
a,a,a FROM FileX;" so you overwrite your local variable. In SQL,
"SELECT a, b, c FROM TableX" returns the same data as "SELECT b, c, a
FROM TableX" because things are located by name, not position.

A field is fixed or variable length, can repeated with an OCCURS in
Cobol, struct in c, etc. A field can change datatypes (union in 'C',
VARIANT in Pascal, REDEFINES in Cobol, EQUIVALENCE in Fortran).

A column is a scalar value, drawn from a single domain (datatype +
constraints + relationships) and represented in one and only one
datatype. You have no idea whatsoever how a column is PHYSICALLY
represented internally; you never see it. Look at temporal datatypes;
in SQL server, DATETIME is a binary number internally (UNIX style system
clock representation), but TIMESTAMP is a string of digits in DB2 (Cobol
style time representation). When you have a field, you have to worry
about that physical representation. SQL says not to worry about the
bits; think of data in the abstract.

Fields have no constraints, no relationships, and no datatype; each
application program assigns such things and they don't have to assign
the same ones! That lack of data integrity was one of the reasons for
RDBMS systems.

Rows and columns have constraints. Records and fields can have anything
in them and often do!! Talk to anyone who has tried to build a data
warehouse about that <g>. My favorite is finding the part number "I
hate my job" in a file.

When new SQL programmers use IDENTITY, GUID, ROWID, or other
auto-numbering vendor extensions to get a key that can be used for
locating a given row, they are imitating a magnetic tape's sequential
access. It lets them know the order in which a row was added to the
table -- just like individual records went onto the end of the tape!

Dr. Codd defined a row as a representation of a single simple fact. A
record is usually a combination of a lot of facts. That is, we don't
normalize a file; you stuff data into it and hope that you have
everything you need for an application. When the system needs new data,
you add fields to the end of the record. That is how we got records
that were measured in Kbytes.

--CELKO--
 ===========================
 Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!



Relevant Pages

  • Re: select the column name in a certain position
    ... There are important differences between a file system and a database, ... A file is usually associated with a particular language -- ever try to ... the internal SQL datatypes are converted into host language ... clock representation), but TIMESTAMP is a string of digits in DB2 (Cobol ...
    (microsoft.public.sqlserver.programming)
  • Re: Help with building this query
    ... There are important differences between a file system and a database, ... A file is usually associated with a particular language -- ever try to ... the internal SQL datatypes are converted into host language ... clock representation), but TIMESTAMP is a string of digits in DB2 (Cobol ...
    (microsoft.public.sqlserver.programming)
  • Re: select the column name in a certain position
    ... In an RDBMS, you use names and logic, not physical locations, to ... > Files are independent of each other, while tables in a database are ... the internal SQL datatypes are converted into host language ... > about that physical representation. ...
    (microsoft.public.sqlserver.programming)
  • Re: Problems with querying date field
    ... the internal SQL datatypes are converted into host language ... A database is language or anything else-independent so long as you don't ... hard-pressed to do it in a way that is independent of the particular SQL ...
    (microsoft.public.sqlserver.programming)
  • Re: Three Kinds of Logical Trees
    ... that includes all code that uses the dbms api would have all such units ... a variable of type int, with an int value and invoke a method ... ... >> database would have all the data and functions it needs to address ... I have no plans to design a language, but I'm happy to learn anyway ...
    (comp.databases.theory)