Re: Filtering in an N to N relationship
From: David Scemama (david.nospam_at_wanadoo.fr)
Date: 02/24/04
- Next message: Roji. P. Thomas: "Re: Insert ?"
- Previous message: David Scemama: "Re: Filtering in an N to N relationship"
- In reply to: Joe Celko: "Re: Filtering in an N to N relationship"
- Next in thread: Chris2: "Re: Filtering in an N to N relationship"
- Messages sorted by: [ date ] [ thread ]
Date: Tue, 24 Feb 2004 11:23:01 +0100
Thanks a lot for you clear and complete answer.
David
"Joe Celko" <joe.celko@northface.edu> wrote in message
news:urcvosj%23DHA.3284@TK2MSFTNGP09.phx.gbl...
> Please post DDL, so that people do not have to guess what the keys,
> constraints, Declarative Referential Integrity, datatypes, etc. in your
> schema are. Here is a wild ass guess:
>
> CREATE TABLE Products -- note use of plural names
> (upc CHAR(10) NOT NULL -- note use of industry standards
> PRIMARY KEY,
> ..);
>
> The use of criteria for looks like characteristics is weird. Criteria
> asks a question and sounds like it should be what you are calling
> "Filters"; mind if I change that? Work on a better name, like
> "packaging", that makes sense in the data model and avoid these vague
> general terms.
>
> CREATE TABLE Packaging
> (package_code CHAR(10) NOT NULL
> PRIMARY KEY,
> ..);
>
> And the (n:m) table is:
>
> CREATE TABLE Characteristics
> (upc CHAR(10) NOT NULL
> REFERENCES Products(upc)
> ON DELETE CASCADE
> ON UPDATE CASCADE,
> package_code CHAR(10) NOT NULL
> REFERENCES Packaging(package_code)
> ON DELETE CASCADE
> ON UPDATE CASCADE,
> PRIMARY KEY (upc, package_code)):
>
> >> I need to select all the products that match a set of criterias. <<
>
> CREATE TABLE Criteria
> (package_code CHAR(10) NOT NULL PRIMARY KEY
> REFERENCES Packaging(package_code)
> ON DELETE CASCADE
> ON UPDATE CASCADE);
>
> Relational division is one of the eight basic operations in Codd's
> relational algebra. The idea is that a divisor table is used to
> partition a dividend table and produce a quotient or results table. The
> quotient table is made up of those values of one column for which a
> second column had all of the values in the divisor.
>
> This is easier to explain with an example. We have a table of pilots
> and the planes they can fly (dividend); we have a table of planes in the
> hangar (divisor); we want the names of the pilots who can fly every
> plane (quotient) in the hangar. To get this result, we divide the
> PilotSkills table by the planes in the hangar.
>
> CREATE TABLE PilotSkills
> (pilot CHAR(15) NOT NULL,
> plane CHAR(15) NOT NULL,
> PRIMARY KEY (pilot, plane));
>
> PilotSkills
> pilot plane
> =========================
> 'Celko' 'Piper Cub'
> 'Higgins' 'B-52 Bomber'
> 'Higgins' 'F-14 Fighter'
> 'Higgins' 'Piper Cub'
> 'Jones' 'B-52 Bomber'
> 'Jones' 'F-14 Fighter'
> 'Smith' 'B-1 Bomber'
> 'Smith' 'B-52 Bomber'
> 'Smith' 'F-14 Fighter'
> 'Wilson' 'B-1 Bomber'
> 'Wilson' 'B-52 Bomber'
> 'Wilson' 'F-14 Fighter'
> 'Wilson' 'F-17 Fighter'
>
> CREATE TABLE Hangar
> (plane CHAR(15) NOT NULL PRIMARY KEY);
>
> Hangar
> plane
> =============
> 'B-1 Bomber'
> 'B-52 Bomber'
> 'F-14 Fighter'
>
> PilotSkills DIVIDED BY Hangar
> pilot
> =============================
> 'Smith'
> 'Wilson'
>
> In this example, Smith and Wilson are the two pilots who can fly
> everything in the hangar. Notice that Higgins and Celko know how to fly
> a Piper Cub, but we don't have one right now. In Codd's original
> definition of relational division, having more rows than are called for
> is not a problem.
>
> The important characteristic of a relational division is that the CROSS
> JOIN (Cartesian product) of the divisor and the quotient produces a
> valid subset of rows from the dividend. This is where the name comes
> from, since the CROSS JOIN acts like a multiplication operator.
>
> Division with a Remainder
>
> There are two kinds of relational division. Division with a remainder
> allows the dividend table to have more values than the divisor, which
> was Codd's original definition. For example, if a pilot can fly more
> planes than just those we have in the hangar, this is fine with us. The
> query can be written in SQL-89 as
>
> SELECT DISTINCT pilot
> FROM PilotSkills AS PS1
> WHERE NOT EXISTS
> (SELECT *
> FROM Hangar
> WHERE NOT EXISTS
> (SELECT *
> FROM PilotSkills AS PS2
> WHERE (PS1.pilot = PS2.pilot)
> AND (PS2.plane = Hangar.plane)));
>
> The quickest way to explain what is happening in this query is to
> imagine an old World War II movie where a cocky pilot has just walked
> into the hangar, looked over the fleet, and announced, "There ain't no
> plane in this hangar that I can't fly!" We are finding the pilots for
> whom there does not exist a plane in the hangar for which they have no
> skills. The use of the NOT EXISTS() predicates is for speed. Most SQL
> systems will look up a value in an index rather than scan the whole
> table. The SELECT * clause lets the query optimizer choose the column
> to use when looking for the index.
>
> This query for relational division was made popular by Chris Date in his
> textbooks, but it is not the only method nor always the fastest.
> Another version of the division can be written so as to avoid three
> levels of nesting. While it is not original with me, I have made it
> popular in my books.
>
> SELECT PS1.pilot
> FROM PilotSkills AS PS1, Hangar AS H1
> WHERE PS1.plane = H1.plane
> GROUP BY PS1.pilot
> HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar);
>
> There is a serious difference in the two methods. Burn down the hangar,
> so that the divisor is empty. Because of the NOT EXISTS() predicates in
> Date's query, all pilots are returned from a division by an empty set.
> Because of the COUNT() functions in my query, no pilots are returned
> from a division by an empty set.
>
> In the sixth edition of his book, INTRODUCTION TO DATABASE SYSTEMS
> (Addison-Wesley; 1995 ;ISBN 0-201-82458-2), Chris Date defined another
> operator (DIVIDEBY ... PER) which produces the same results as my query,
> but with more complexity.
>
> Exact Division
>
> The second kind of relational division is exact relational
> division. The dividend table must match exactly to the values of
> the divisor without any extra values.
>
> SELECT PS1.pilot
> FROM PilotSkills AS PS1
> LEFT OUTER JOIN
> Hangar AS H1
> ON PS1.plane = H1.plane
> GROUP BY PS1.pilot
> HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar)
> AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hangar);
>
> This says that a pilot must have the same number of certificates as
> there planes in the hangar and these certificates all match to a plane
> in the hangar, not something else. The "something else" is shown by a
> created NULL from the LEFT OUTER JOIN.
>
> Please do not make the mistake of trying to reduce the HAVING clause
> with a little algebra to:
>
> HAVING COUNT(PS1.plane) = COUNT(H1.plane)
>
> because it does not work; it will tell you that the hangar has (n)
> planes in it and the pilot is certified for (n) planes, but not that
> those two sets of planes are equal to each other.
>
> Note on Performance
>
> The nested EXISTS() predicates version of relational division was made
> popular by Chris Date's textbooks, while the author is associated with
> popularizing the COUNT(*) version of relational division. The Winter
> 1996 edition of DB2 ON-LINE MAGAZINE http://www.db2mag.com/96011ar:htm)
> had an article entitled "Powerful SQL:Beyond the Basics" by Sheryl
> Larsen which gave the results of testing both methods. Her conclusion
> for DB2 was that the nested EXISTS() version is better when the quotient
> has less than 25% of the dividend table's rows and the COUNT(*) version
> is better when the quotient is more than 25% of the dividend table.
>
> --CELKO--
>
>
> *** Sent via Developersdex http://www.developersdex.com ***
> Don't just participate in USENET...get rewarded for it!
- Next message: Roji. P. Thomas: "Re: Insert ?"
- Previous message: David Scemama: "Re: Filtering in an N to N relationship"
- In reply to: Joe Celko: "Re: Filtering in an N to N relationship"
- Next in thread: Chris2: "Re: Filtering in an N to N relationship"
- Messages sorted by: [ date ] [ thread ]