Re: Checking if a set exists already -- non-normalized to normalized
From: John Gilson (jag_at_acm.org)
Date: 08/09/04
- Next message: Uri Dimant: "Re: DIfference of two tables"
- Previous message: Uri Dimant: "Re: Help with Query"
- In reply to: Joe Celko: "Re: Checking if a set exists already -- non-normalized to normalized"
- Messages sorted by: [ date ] [ thread ]
Date: Mon, 09 Aug 2004 05:37:06 GMT
"Joe Celko" <jcelko212@earthlink.net> wrote in message news:erVgWoKfEHA.3412@TK2MSFTNGP11.phx.gbl...
> >> The problem is that I do not want to insert duplicate Boxes (i.e.
> boxes with the exact same set of products in it). <<
>
> This is called "Todd's Division" and it is usually worded as "who sells
> exactly the same parts as company X?" in the literature.
>
> The bad news ids that you will have to compare all the pairs of
> boxes+products in your table to find the duplicate sets. Her is a "cut
> & paste" from my current working draft of SQL FOR SMARTIES third
> edition:
>
> 19.02.04. Todd's Division
>
> A relational division operator proposed by Stephen Todd is
> defined on two tables with common columns that are joined together,
> dropping the JOIN column and retaining only those
> non-JOIN columns that meet a criterion.
>
> We are given a table, JobParts(jobno, partno), and another table,
> SupParts(supno, partno), of suppliers and the parts that they provide.
> We want to get the supplier-and-job pairs such that supplier sn supplies
> all of the parts needed for job jn. This is not quite the same thing as
> getting the supplier-and-job pairs such that job jn requires all of the
> parts provided by supplier sn.
>
> You want to divide the JobParts table by the SupParts table. A rule of
> thumb: The remainder comes from the dividend, but all values in the
> divisor are present.
>
> JobParts SupParts Result = JobSups
> job pno sno pno job sno
> ======== ============= ============
> 'j1' 'p1' 's1' 'p1' 'j1' 's1'
> 'j1' 'p2' 's1' 'p2' 'j1' 's2'
> 'j2' 'p2' 's1' 'p3' 'j2' 's1'
> 'j2' 'p4' 's1' 'p4' 'j2' 's4'
> 'j2' 'p5' 's1' 'p5' 'j3' 's1'
> 'j3' 'p2' 's1' 'p6' 'j3' 's2'
> 's2' 'p1' 'j3' 's3'
> 's2' 'p2' 'j3' 's4'
> 's3' 'p2'
> 's4' 'p2'
> 's4' 'p4'
> 's4' 'p5'
>
> Pierre Mullin submitted the following query to carry out the Todd
> division:
>
> SELECT DISTINCT JP1.job, SP1.supplier
> FROM JobParts AS JP1, SupParts AS SP1
> WHERE NOT EXISTS
> (SELECT *
> FROM JobParts AS JP2
> WHERE JP2.job = JP1.job
> AND JP2.part
> NOT IN (SELECT SP2.part
> FROM SupParts AS SP2
> WHERE SP2.supplier = SP1.supplier));
>
> This is really a modification of the query for Codd's division, extended
> to use a JOIN on both tables in the outermost SELECT statement. The IN
> predicate for the second subquery can be replaced with a NOT EXISTS
> predicate; it might run a bit faster, depending on the optimizer.
>
> Another related query is finding the pairs of suppliers who sell the
> same parts. In this data, that would be the pairs (s1, p2), (s3, p1),
> (s4, p1), (s5, p1)
>
> SELECT S1.sup, S2.sup
> FROM SupParts AS S1, SupParts AS S2
> WHERE S1.sup < S2.sup -- different suppliers
> AND S1.part = S2.part -- same parts
> GROUP BY S1.sup, S2.sup
> HAVING COUNT(*) = (SELECT COUNT (*) -- same count of parts
> FROM SupParts AS S3
> WHERE S3.sup = S1.sup)
> AND COUNT(*) = (SELECT COUNT (*)
> FROM SupParts AS S4
> WHERE S4.sup = S2.sup);
Another formulation of this would be
SELECT SP1.sup AS sup1, SP2.sup AS sup2
FROM (SELECT sup, COUNT(*) AS tally
FROM SupParts
GROUP BY sup) AS SP1
INNER JOIN
(SELECT sup, COUNT(*) AS tally
FROM SupParts
GROUP BY sup) AS SP2
ON SP1.sup < SP2.sup AND
SP1.tally = SP2.tally
INNER JOIN
SupParts AS SP3
ON SP3.sup = SP1.sup
INNER JOIN
SupParts AS SP4
ON SP4.sup = SP2.sup AND
SP3.part = SP4.part
GROUP BY SP1.sup, SP1.tally, SP2.sup
HAVING COUNT(*) = SP1.tally;
-- JAG > This can be modified into Todd's division easily be adding the > restriction that the parts must also belong to a common job. > > Steve Kass came up with a specialized version that depends on using a > numeric code. Assume we have a table that tells us which players are on > which teams. > > CREATE TABLE TeamAssignments > (player_id INTEGER NOT NULL > REFERENCES Players(player_id) > ON DELETE CASCADE > ON UPDATE CASCADE, > team_id CHAR(5) NOT NULL > REFERENCES Teams(team_id) > ON DELETE CASCADE > ON UPDATE CASCADE, > PRIMARY KEY (player_id, team_id)); > > To get pairs of Players on the same team: > > SELECT P1.player_id, P2.player_id > FROM Players AS P1, Players AS P2 > WHERE P1.player_id < P2.player_id > GROUP BY P1.player_id, P2.player_id > HAVING P1.player_id + P2.player_id > = ALL (SELECT SUM(P3.player_id) > FROM TeamAssignments AS P3 > WHERE P3.player_id IN (P1.player_id, P2.player_id) > GROUP BY P3.team_id); > > --CELKO-- > =========================== > Please post DDL, so that people do not have to guess what the keys, > constraints, Declarative Referential Integrity, datatypes, etc. in your > schema are. > > *** Sent via Developersdex http://www.developersdex.com *** > Don't just participate in USENET...get rewarded for it!
- Next message: Uri Dimant: "Re: DIfference of two tables"
- Previous message: Uri Dimant: "Re: Help with Query"
- In reply to: Joe Celko: "Re: Checking if a set exists already -- non-normalized to normalized"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|