Re: Database Design & Normalization Question

From: AnthonyThomas (AnthonyThomas_at_discussions.microsoft.com)
Date: 11/20/04


Date: Sat, 20 Nov 2004 11:31:04 -0800

Like Mark replied, it depends on the requirements. In that, I agree;
however, where our friend says he starts with User Interface and the process,
this is where I disagree. Although it does DEPEND, data models depend on ONE
thing, the BUSINESS MODEL, which should have reflacted the real world
structure.

You have asked two seperate questions here. So, let's start with the easier
one, the Titles. First of all, this is not a Many-to-Many relationship; it
is a One-to-Many-to-One relationship: Products, Other Titles, Title Types.

I would claim that the Title is an attribute of the Product; moreover, it is
a candidate key. I would require each of my products to have a title. How
else would you define it? Now, some products may have 1 or more Other
Titles, or Additional Titles. This sentence clues you into what kind of
structure this should have. "Some Products" implies a 0, 1, or N relation.
If it were simply 0 or 1, then a NULL attribute may be sufficient.

I tend shy away from NULL attributes and use subsidiary tables unless the
creation and management of that additional table would be overkill with
respect to the original attribute. My favorite example is the Middle Initial
attribute. Now, in this case, seperating that attribute out to a dependent
subsidiary table would be overkill.

In your case, however, this is what we will need. Why? Because, in all
likelihood, the potential table could be a very large attribute and may not
be queried very frequently; so, why embed such a thing in the original table?
 In addition, you will want to potentially have Many additional or subsidiary
titles. This requires the seperation.

This table would be One or Many-to-One against the Product table. Then, you
will probably want to classify the titles based on type, restricted or
otherwise, in the sense that you may allow only one additional title per
type. This would spawn the process of the type table that would be in
One-to-Many correspondence with the Other Titles table.

Let me know if that does not make sense.

Now, the dates. Here is a reason to NOT split this out to a Many-to-Many
relationship: how would you query it? Many-to-Many relationships have a
tendancy, especially the "Type" table kinds, to turn columns (attribute
names) in to data rows. When that happens, you have turned a simple SELECT
column FROM table1 (or joined to table2) from a horridous WHERE clause where
you have to specify multiple filter commands just to discover the record you
are after BEFORE you can determine the "VALUE" column that contains the data.
 I've seen this numerous times and it will kill your system.

Now, I do not want to backtrack by saying that Performance should overrule
design. NO. However, oftentimes bad design will kill performance, no matter
what you do to improve it. Proper design, even when it looks like a
performance killer at first (usually because of the induced additional joins)
will usually save your performance in the long run.

Here is what I mean. A table is a representation of an Entity--a thing, a
noun, a specific item--that "relates" all of the appropriate and dependent
characteristics of that Entity. Now, your products have dates.
Normalization would dictate that these dates would depend on the KEY, the
WHOLE KEY, and NOTHING BUT THE KEY. I will assume that your products have a
primary TITLE (?) as the identifying characteristic. Perhaps it is some sort
of combination, regardless if you have introduced some sort of machine
generated ID surrogate key.

Now, the question is: Are these Dates Entities on their own, or, are they
somehow dependent on the Product Entity? I can see at least two ways this
may be true, although there is surely others.

1. The dates are individually unique attributes. The misnaming of them as
Date1, Date2, etc. may cause someone to believe the table has now violated
1NF or 2NF; however, if those attributes are not generic and follow the KEY
rules, then they are perfectly acceptable as embedded attributes within the
Entity they are related to.

2. There is some form of functional or constrained dependancy between the
dates. This would indicate the prescence and need for higher forms of
normalization: Boyce-Codd NF, 4NF, 5NF, or the new 6NF. These may not be
collection of dates but Product Release Schedules, a different but related
Entity, assigned (or related) to the Product Entity, which have ScheduleDate
as one of their attributes.

Regardless, you will have to decide the classification of these but DO NOT
GENERICIZE the attributes into a Muck Table: ID, Name, Description with the
associated crosstab table with the Value attribute. Type tables are
necessary but not all inclusive. The Value attribute has little value and
should be an indicator that the Entity you have defined has been genericized.
 An Entity only exists if it is well-defined and models a real world element,
not an abstracted OO class.

I hope this makes sense. Feel free to continue the conversation if it does
not.

Sincerely,

Anthony Thomas

"Jason Hedges" wrote:

> Anthony,
>
> Thanks for your reply and the book recommendation.
>
> Can you give me more direction on the date issue? I
> understand the logic behind the books, videos, software,
> etc. What do you do with data like the date fields?
> Should each date type (pub date, street date, etc.) be a
> separate field/column in a table? I thought the way I had
> done it allowed for alot of flexibility because you
> didn't have to change the database design to store new
> types of data that were similar to other types already
> being stored. Here's another example (similar to the date
> scenario):
>
> I have a table that defines "title types" (sub title,
> foreign title, title as it appears on the product, etc.).
> I have a many-to-many relationship table that stores a
> product identifier (relating back to the master product
> table), a title, and a title type identifer (relating to
> the title types table). This seemed flexible since I can
> decide to store a "misc title" simply by adding
> another "title type" to my title types table and using
> it's identifer.
>
> How should I be storing this type of data? Should each
> type of title be a separate field in a table? Should I
> just store NULLS or empty strings when I don't have a
> value for one of these fields?
>
> Thanks!
> Jason
>
> >-----Original Message-----
> >Mark is on a different path, although, headed in the
> right direction.
> >
> >First, you need to understand this, the goals of
> database design and those
> >of application design are predicated on two different,
> sometimes opposing,
> >sometimes cooperative, criteria. Applications are
> designed for data
> >MANIPULATION and PROCESS. Databases, however, are
> designed for data
> >INTEGRITY and IDENTIFICATION.
> >
> >In the sense that you must understand what the User
> wants is absolutely
> >correct. For the application, the GUI and the process
> of information is a
> >perfectly acceptable viewpoint to model the appliction
> design and process.
> >Howerver, this is completely wrong for the design of the
> database system.
> >
> >I am assuming that many of those "unique" attributes are
> based on the
> >product "type." If there are truely common
> caracteristics of Entity Classes,
> >then perhaps a Super Table Subordinate Table relation
> would work better,
> >although the current SQL DBMSes do not fully support
> this sort of structure
> >well.
> >
> >The idea is that there are Products, which have common
> attributes, but,
> >then, there are Books that are dissimilar from software
> and videos. Now, you
> >can call all of these Products, but, then, we must deal
> with the type
> >specific attributes. All in the same table allowed
> NULL? Seperate table and
> >make entries for only those that have data? For all of
> the unique columns or
> >only some because it is likely that you will only
> populate some base on type.
> > Aha, sounds like a subordinate class. You create a
> Book table for the Book
> >entity, a Video table for the Video entity, etc, each
> with its own specific
> >attributes. You can tie all of these back to a common
> Products table to
> >carry the common attributes if you wish. The point is
> that Books are not
> >Videos are not Software, even if they do have common
> characteristics. Just
> >because humans and fish have eyes and mouths does not
> make a human a fish nor
> >a fish a human. Aha, but they are both animals. See
> the structure?
> >
> >The point is that data modeling comes down to describing
> the real world, not
> >as a process, but as a classification, for delineating,
> reducing, individual
> >attributes.
> >
> >I think you basically had the right ideas for most of
> what you described;
> >however, take a look at the seperate or super-sub table
> structure. But the
> >dates, no, that is where you started getting into the
> MUCK area. A
> >pulication date is not a release date any more than a
> start time is the same
> >as an end time. Is your birthdate the same as your
> death date? I certainly
> >hope not!
> >
> >Forget about all of that garbage about normalization
> performance. Research
> >has shown that the proper normalization of data to
> ensure the integrity and
> >reduce the duplicity of data is by far the best means to
> garauntee the best
> >performance. That is why we have DBMSes, to maximize
> the performance GIVEN a
> >normalized database. A DBMS is a physical mechanism,
> the data model and its
> >normalization is a logical one. The two are indepenent,
> although related, or
> >better, derived. That is, the physical is derived from
> the logical, not the
> >other way around. So, performance should never be a
> constraint on the
> >design, only the integrity of the system.
> >
> >Feel free to follow up if you think we could provide you
> with any further
> >assistance.
> >
> >Check out "An Introduction to Database Systems" by C. J.
> Date. I think you
> >would find it enlightening.
> >
> >Sincerely,
> >
> >
> >Anthony Thomas
> >
> >
> >"Jason Hedges" wrote:
> >
> >>
> >> Ok, maybe I've travelled too far down the "everything
> is
> >> a type" road already.
> >>
> >> Here's the scenario: I am designing a new database and
> >> application in SQL Server and .Net for managing
> product
> >> information for bookstores (strictly product data, not
> >> sales, customers, etc.). The current database and
> >> application have been developed with COBOL. The main
> >> types of products we deal with are books, music,
> bibles,
> >> videos, software, and gifts. Every product has some
> >> common attributes (item number, title, retail price,
> >> vendor) and some products have additional attributes
> (sub-
> >> title, measurements, publish date, release date,
> foreign
> >> language title, etc.). Products can have from 1 to 3
> >> contributors (author, artist) and be categorized in 1
> to
> >> 3 categories.
> >>
> >> I've ended up with alot of many-to-many tables. For
> >> example, I have the main products tables, a master
> >> category table, and a many-to-many table that links
> >> products with multiple categories. Same thing for
> >> contributors. Another example: since there are several
> >> types of dates that can be stored for a product
> (publish
> >> date, release date, etc.), I have a "Date Type" table
> and
> >> a many-to-many table that holds a list of dates and
> their
> >> respective "date type" and the product item number
> they
> >> belong to. I thought this would be the best way since
> >> the number of each was an unknown.
> >>
> >> I was under the impression that having a lot of fields
> in
> >> a table that were not used was poor design. Example:
> >> having a SubTitle field in my main product database
> would
> >> be poor design since many products don't have a sub-
> >> title. Is that true or false?
> >>
> >> I'd appreciate some more input if you have enough info
> >> here to go on.
> >>
> >> Thanks!
> >> Jason
> >>
> >> >-----Original Message-----
> >> >I agree with the assessment that no sound judgement
> may
> >> be passed on any
> >> >particular design without the full business rules the
> >> design was based on;
> >> >however, there are a few problems that crop up now
> and
> >> again.
> >> >
> >> >First, database normalization is a classification
> >> mechanism that attempts to
> >> >model a real world system. It is a method of
> logically
> >> seperating out
> >> >individually defined "nouns" or entities.
> >> >
> >> >What I often see is that developers new to database
> >> design tend to apply
> >> >leasons learned from OOP/OOA to data modeling.
> Database
> >> design is a
> >> >reductionist exercise to classify and reduce data
> >> redundancy. OOP/OOA is an
> >> >aggregation of process consolidation whereby a user,
> by
> >> generalization, may
> >> >abstract out common features. These are at two polar
> >> extremes, or orthogonal
> >> >to one another. Be careful.
> >> >
> >> >The first sign that you may have strayed too far is
> when
> >> you start building
> >> >MUCK tables and overly use Many-to-Many
> relationships.
> >> Everything is a
> >> >"type" with an ID, Name, and description, right?
> >> WRONG! Don't go down that
> >> >track.
> >> >
> >> >Sincerely,
> >> >
> >> >
> >> >Anthony Thomas
> >> >
> >> >
> >> >"Anith Sen" wrote:
> >> >
> >> >> >> How far should I go with normalizing my database?
> >> >>
> >> >> As far as you can. In other words, full
> normalization
> >> upto BCNF in case of
> >> >> tables with single column keys ( which will be in
> 5NF
> >> anyway ) and upto 5NF
> >> >> in case of tables with composite keys and
> overlapping
> >> sets of values in
> >> >> different rows are mandatory to avoid data
> >> modification anomalies.
> >> >>
> >> >> >> Is this the correct way to design the database?
> Is
> >> it worth the trouble
> >> >> >> to do it this way?
> >> >>
> >> >> Without a detailed knowledge of your business model
> >> and existing entity
> >> >> types, its attributes and applicable relationships
> >> others cannot comment on
> >> >> a particular design narrative. If you are familiar
> >> with higher normal forms
> >> >> beyond 1NF, every principled decomposition to avoid
> >> moification anomalies is
> >> >> definitely worth the "trouble".
> >> >>
> >> >> BTW, NULLs have nothing to do with any normal forms
> >> beyond 1NF.
> >> >>
> >> >> >> It seems like it would be alot easier for
> INSERTS,
> >> UPDATES, and SELECTS
> >> >> >> to have the extra title fields as part of the
> >> Products database and just
> >> >> >> store empty strings in them if I don't have a
> value.
> >> >>
> >> >> "Ease" of writing INSERTS, UPDATES, and SELECTS is
> not
> >> a design principle;
> >> >> but integrity preservation is. You wouldn't
> consider a
> >> single table for
> >> >> representing the entire schema by such an
> assessment
> >> of ease, would you?
> >> >>
> >> >> >> Just want to confirm that I'm heading in the
> right
> >> direction.
> >> >>
> >> >> If you are considering full normalization in your
> >> logical design process,
> >> >> you are on the right track.
> >> >>
> >> >> --
> >> >> Anith
> >> >>
> >> >>
> >> >>
> >> >.
> >> >
> >>
> >.
> >
>



Relevant Pages

  • Re: Help writing SQL statement in PHP script
    ... IMPOSSIBLE to determine if the database is normalized or not from the ... normalized - AND STILL HAVE DUPLICATES IN THESE COLUMNS. ... It would be nice if you knew something about database normalization ... There is nothing flawed about a design which has three ...
    (comp.lang.php)
  • Re: Database Design Problem
    ... Design is a hard thing to do if you want a good design that is. ... because of bad database design. ... There are lots of normalization rules ... and why I include the customer PO in the OrderDetails is because ...
    (microsoft.public.sqlserver.programming)
  • Re: Newbie question about db normalization theory: redundant keys OK?
    ... information (my understanding of what a normalized database is). ... Or, is there ever a time that you want redundant keys (that is, the ... The answer is yes, there are times when a design is a good one, ... For each normalization form, ...
    (comp.databases.theory)
  • RE: Northwind Template
    ... hand you'd learn more about database design as you go along if you start from ... I would not create a column for your structured CustomerID. ... entire database, but I will give it a try. ... example of its lack of proper normalization can be found in its Customers ...
    (microsoft.public.access.gettingstarted)
  • C# programmer looking for a job
    ... Software Development including Desktop, Client/Server and Database ... Practical skills in object oriented design and design patterns ... XML, Oracle, CVS, VSS, Delphi, bug tracking. ... Developed in Delphi5; ...
    (misc.immigration.usa)