Re: Relational Division (?) - establishing Contains / Intersects
From: Damien Laffan (iGetEnoughSpamAlreadyThankyou_at_nospam.com)
Date: 05/24/04
- Next message: Josh: "PLEASE HELP!!!! I beleive I have written the most inefficient query ever!!!!"
- Previous message: Stephen: "Re: Trigger to call a DLL or SQL function."
- In reply to: Joe Celko: "Re: Relational Division (?) - establishing Contains / Intersects"
- Next in thread: Damien Laffan: "Re: Relational Division (?) - establishing Contains / Intersects"
- Messages sorted by: [ date ] [ thread ]
Date: Tue, 25 May 2004 09:16:57 +1000
Joe,
Can you please confirm that this DDL is what you meant to post, As it only allows for a single location in a given region.
CREATE TABLE Regions
(region_name VARCHAR(20) NOT NULL PRIMARY KEY,
region_id INTEGER NOT NULL UNIQUE,
location_id INTEGER NOT NULL
REFERENCES Locations (location_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
PRIMARY KEY (location_id, region_id));
Perhaps some sloppy default cut and pasting is to blame ;)
Regards,
Damien.
"Joe Celko" <jcelko212@earthlink.net> wrote in message news:uPWjB9ZQEHA.2404@TK2MSFTNGP12.phx.gbl...
In the US, we have a FIPS (Federal Information Processing Standards)
numeric code for each political unit down to the county level; there are
Census _codes down to the individual blocks and streets. I assume that
your location_id is the same kind of thing, not some unverifiable
sequential number.
Do you really have names that are fifty letters long?? If you were
Welsh, I'd believe it, but it looks like sloppy default programming
here.
CREATE TABLE Locations
(location_id INTEGER NOT NULL UNIQUE, -- Aussie standard code?
location_name VARCHAR(20) NOT NULL,
post_code CHAR(4) NOT NULL,
state CHAR(3) NOT NULL,
country_code CHAR(2) DEFAULT 'AU' NOT NULL,
PRIMARY KEY (location_name, post_code, state, country_code));
Regions are made of locations, so they need to be together; it is easier
to make the table that holds the complete fact. Or use a VIEW based on
the original schema.
CREATE TABLE Regions
(region_name VARCHAR(20) NOT NULL PRIMARY KEY,
region_id INTEGER NOT NULL UNIQUE,
location_id INTEGER NOT NULL
REFERENCES Locations (location_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
PRIMARY KEY (location_id, region_id));
There are two kinds of relational division. Division with a remainder
allows the dividend table to have more values than the divisor, which
was Codd's original definition. For example, if a pilot can fly more
planes than just those we have in the hangar, this is fine with us.
The query can be written as
SELECT DISTINCT region_id
FROM Regions AS R1
WHERE NOT EXISTS
(SELECT *
FROM Locations
WHERE NOT EXISTS
(SELECT *
FROM Regions AS R2
WHERE (R1.region_id = R2.region_id)
AND (R2.location_id = Locations.location_id)));
This query for relational division was made popular by Chris Date in his
textbooks, but it is not the only method nor always the fastest.
Another version of the division can be written so as to avoid three
levels of nesting. While it is not original with me, I have made it
popular in my books.
SELECT R1.region_id
FROM Regions AS R1, Locations AS L1
WHERE R1.location_id = L1.location_id
GROUP BY R1.region_id
HAVING COUNT(R1.location_id) = (SELECT COUNT(location_id) FROM
Locations);
There is a serious difference in the two methods. Burn down the
Locations, so that the divisor is empty. Because of the NOT EXISTS()
predicates in Date's query, all region_ids are returned from a division
by an empty set. Because of the COUNT() functions in my query, no
region_ids are returned from a division by an empty set.
In the sixth edition of his book, INTRODUCTION TO DATABASE SYSTEMS
(Addison-Wesley; 1995 ;ISBN 0-201-82458-2), Chris Date defined another
operator (DIVIDEBY ... PER) which produces the same results as my query,
but with more complexity.
Exact Division
The second kind of relational division is exact relational division.
The dividend table must match exactly to the values of the divisor
without any extra values.
SELECT R1.region_id, R1.region_name
FROM Regions AS R1
LEFT OUTER JOIN
Locations AS L1
ON R1.location_id = L1.location_id
GROUP BY R1.region_id
HAVING COUNT(R1.location_id) = (SELECT COUNT(location_id) FROM
Locations)
AND COUNT(L1.location_id) = (SELECT COUNT(location_id) FROM
Locations);
This says that a region_id must have the same number of certificates as
there location_ids in the Locations and these certificates all match to
a location_id in the Locations, not something else. The "something
else" is shown by a created NULL from the LEFT OUTER JOIN.
Please do not make the mistake of trying to reduce the HAVING clause
with a little algebra to:
HAVING COUNT(R1.location_id) = COUNT(L1.location_id)
because it does not work; it will tell you that the Locations has (n)
location_ids in it and the region_id is certified for (n) location_ids,
but not that those two sets of location_ids are equal to each other.
Note on Performance
The nested EXISTS() predicates version of relational division was made
popular by Chris Date's textbooks, while the author is associated with
popularizing the COUNT(*) version of relational division. The Winter
1996 edition of DB2 ON-LINE MAGAZINE (http://www.db2mag.com/96011ar:htm)
had an article entitled "Powerful SQL:Beyond the Basics" by Sheryl
Larsen which gave the results of testing both methods. Her conclusion
for DB2 was that the nested EXISTS() version is better when the quotient
has less than 25% of the dividend table's rows and the COUNT(*) version
is better when the quotient is more than 25% of the dividend table.
This somewhat complicated relational division is due to Richard Romley
at Salomon Smith Barney. The original problem deals with two tables.
The first table has a list of managers and the projects they can manage.
The second table has a list of Personnel, their departments and the
project to which they are assigned. Each employee is assigned to one
and only one department and each employee works on one and only one
project at a time. But a department can have several different projects
at the same time, so a single project can span several departments.
CREATE TABLE MgrProjects
(mgr CHAR(2) NOT NULL,
project CHAR(2) NOT NULL,
PRIMARY KEY(mgr, project));
Pardon me while I lapse into Standard SQL:
INSERT INTO Mgr_Project
VALUES ('M1', 'P1'), ('M1', 'P3'),
('M2', 'P2'), ('M2', 'P3'),
('M3', 'P2'),
('M4', 'P1'), ('M4', 'P2'), ('M4', 'P3');
CREATE TABLE Personnel
(emp CHAR(2) NOT NULL,
dept CHAR(2) NOT NULL,
project CHAR(2) NOT NULL,
UNIQUE (emp, project),
UNIQUE (emp, dept),
PRIMARY KEY (emp, dept, project));
INSERT INTO Personnel
VALUES ('Al', 'D1', 'P1'),
('Bob', 'D1', 'P1'),
('Carl', 'D1', 'P1'),
('Don', 'D1', 'P2'),
('Ed', 'D1', 'P2'),
('Frank', 'D1', 'P2'),
('George', 'D1', 'P2');
INSERT INTO Personnel
VALUES ('Harry', 'D2', 'P2'),
('Jack', 'D2', 'P2'),
('Larry', 'D2', 'P2'),
('Mike', 'D2', 'P2'),
('Nat', 'D2', 'P2');
INSERT INTO Personnel
VALUES ('Oscar', 'D3', 'P2'),
('Pat', 'D3', 'P2'),
('Rich', 'D3', 'P3');
As a side note, look at the use of UNIQUE constraints to enforce the
data integrity rules of the problem. That is a very useful trick.
The problem is to generate a report showing for each manager each
department whether is he qualified to manage none, some or all of the
projects being worked on within the department. To find who can manage
some, but not all, of the projects, use a version of relational
division.
SELECT M1.mgr, P1.dept
FROM MgrProjects AS M1
CROSS JOIN
Personnel AS P1
WHERE M1.project = P1.project
GROUP BY M1.mgr, P1.dept
HAVING COUNT(*) <> (SELECT COUNT(emp)
FROM Personnel AS P2
WHERE P2.dept = P1.dept);
The query is simply a relational division with a <> instead of an = in
the HAVING clause. Richard came back with a modification of my answer
which uses a characteristic function inside a single aggregate function.
SELECT DISTINCT M1.mgr, P1.dept
FROM (MgrProjects AS M1
INNER JOIN
Personnel AS P1
ON M1.project = P1.project)
INNER JOIN
Personnel AS P2
ON P1.dept = P2.dept
GROUP BY M1.mgr, P1.dept, P2.project
HAVING MAX (CASE WHEN M1.project = P2.project
THEN 1 ELSE 0 END) = 0;
This query uses a characteristic function while my original version
compares a count of Personnel under each manager to a count of Personnel
under each project. The use of "GROUP BY M1.mgr, P1.dept, P2.project"
with the "SELECT DISTINCT M1.mgr, P1.dept" is really the tricky part in
this new query. What we have is a three dimensional space with the (x,
y, z) axis representing (mgr, dept, project) and then we reduce it to
two
dimensions (mgr, dept) by seeing if Personnel on shared projects cover
the department or not.
That observation lead to the next changes. We can build a table which
shows each combination of manager, department and the level of authority
they have over the projects they have in common. That is the derived
table T1 in the following query; authority = 1 means the manager is not
on the project and authority = 2 means that he is on the project
SELECT T1.mgr, T1.dept,
CASE SUM(T1.authority)
WHEN 1 THEN 'None'
WHEN 2 THEN 'All'
WHEN 3 THEN 'Some'
ELSE NULL END AS power
FROM (SELECT DISTINCT M1.mgr, P1.dept,
MAX (CASE WHEN M1.project = P1.project
THEN 2 ELSE 1 END) AS authority
FROM MgrProjects AS M1
CROSS JOIN
Personnel AS P1
GROUP BY m.mgr, P1.dept, P1.project) AS T1
GROUP BY T1.mgr, T1.dept;
Another version:
SELECT PS1.pilot,
CASE WHEN COUNT(PS1.plane) > (SELECT COUNT(plane) FROM Hanger)
AND COUNT(H1.plane) = (SELECT COUNT(plane)FROM Hanger)
THEN 'more than all'
WHEN COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hanger)
AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hanger)
THEN 'exactly all '
WHEN MIN(H1.plane) IS NULL
THEN 'none '
ELSE 'some ' END AS skill_level
FROM PilotSkills AS PS1
LEFT OUTER JOIN
Hanger AS H1
ON PS1.plane = H1.plane
GROUP BY PS1.pilot;
--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
- Next message: Josh: "PLEASE HELP!!!! I beleive I have written the most inefficient query ever!!!!"
- Previous message: Stephen: "Re: Trigger to call a DLL or SQL function."
- In reply to: Joe Celko: "Re: Relational Division (?) - establishing Contains / Intersects"
- Next in thread: Damien Laffan: "Re: Relational Division (?) - establishing Contains / Intersects"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|