Re: Problem with views switching columns
From: Joe Celko (jcelko212_at_earthlink.net)
Date: 11/18/04
- Next message: Louis Davidson: "Re: Inserting big values"
- Previous message: Louis Davidson: "Re: Not to Duplicate items"
- In reply to: Joss57: "Re: Problem with views switching columns"
- Messages sorted by: [ date ] [ thread ]
Date: Thu, 18 Nov 2004 09:02:21 -0800
>> I have a data structure with documents and attributes relating to
these documents. This data are normalized in two tables, documents and
attributes. <<
This is called EAV (Entity-Attribute-Value) design. It is not
normalized, not even close to normalized! It violates 1NF thru DKNF by
not having properly defined domains.
>> So I have some triggers in order to alter the structure of
FlexibleColumnsTable when a new attribute is added. It is not very
common so I can live with the ALTER TABLE into a trigger, but I don't
like it very much to be honest.. <<
Yes, having absolutely no idea what the schema looks like from moment to
moment could be a problem ..
Exploratory medical or other research projects where nothing whatsoever
is known about the nature of the problem or the data model in any
detail, but a huge common vocabulary exists in a machine usable format
and an XML tool is not available.
There is a paper entitled "An Introduction to Entity-Attribute-Value
Design for Generic Clinical Study Data Management Systems" by Prakash
Nadkarni MD from the Center for Medical Informatics at Yale University
Medical School that presents this method.
The problem with Large-scale Clinical Study Data Management Systems
(CSDMSs) is that clinical studies can have an arbitrary number of
clinical parameters in an arbitrary number of patients. You literally
do not know what the reality that you are trying to model looks like
until you explore it. He used a production open-source CSDMS, TrialDB,
to illustrate different aspects of generic CSDMS creation. But he warns
the reader of his papers that:
"At the outset, we must make clear that creation of a generic CSDMS is
not for everybody: it is a major software undertaking, considerable
programming and database expertise must be available, and a lot of
infrastructure must be built before the benefits of a generic
architecture can be realized. Also, we wish to emphasize that building
a clinical study database generically is not the only 'correct' way,
which would imply that using a non-generic architecture is 'wrong'. It
depends on the task at hand: specifically, the number of studies that
must be managed, and their heterogeneity, in terms of the different
sub-domains of medicine that they must represent: the greater the
heterogeneity, the greater the likelihood of a generic architecture
being the 'correct' one. For a single study, the generic approach is
overkill. (Similarly, when there are a bunch of studies dealing with a
single domain, such as cancer, one can get away with a non-generic
design. One of the most functional cancer-oriented CSDMSs, built by
Colin Begg’s group at Sloan-Kettering, is non-generic.)"
The EAV schema remains unaltered if the number of attributes increases
during exploration, as in medical research. However, end users think of
the data as being conventionally structured with one column per
attribute ("parameter" in clinical terms) and based their reports on
that mental model. Similarly, most analytical programs, such as
statistics packages, also require the data in the one column per
parameter format.
As Dr. Nadkarni so nicely phrased it: "Remember that a single EAV table
stores highly heterogeneous facts- apples, oranges, chop suey and wasabi
– all in the same column. It would obviously be impossible to do basic
and multivariate statistics with the data arranged this way. To be
analyzable, this data must be first rearranged into the format that
analytical programs expect. … Therefore, the user interface to a
well-designed EAV system must create the illusion of conventional data
storage through programming sleight-of-hand. The output of queries, or
data extracts destined for analysis, must similarly be formatted
conventionally. To put it another way, the EAV model is designed only
for flexible storage in a clinical data repository. It was never
intended for direct analysis, and the very existence of the EAV
structure must be a well-kept secret as far as end-users are concerned."
These days, XML is becoming the tool of choice for this kind of
semi-structured data, not SQL. You can buy many XML tools that will let
you validate names, tags and so forth and not have to write an elaborate
system from scratch.
Throw this out and get a real RDBMS design. You have no data integrity
and are probably accumulating orphaned records in physical storage.
--CELKO--
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are. Sample data is also a good idea, along with clear
specifications.
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
- Next message: Louis Davidson: "Re: Inserting big values"
- Previous message: Louis Davidson: "Re: Not to Duplicate items"
- In reply to: Joss57: "Re: Problem with views switching columns"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|