Re: Checking if a set exists already -- non-normalized to normalized

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: John Gilson (jag_at_acm.org)
Date: 08/09/04


Date: Mon, 09 Aug 2004 05:37:06 GMT


"Joe Celko" <jcelko212@earthlink.net> wrote in message news:erVgWoKfEHA.3412@TK2MSFTNGP11.phx.gbl...
> >> The problem is that I do not want to insert duplicate Boxes (i.e.
> boxes with the exact same set of products in it). <<
>
> This is called "Todd's Division" and it is usually worded as "who sells
> exactly the same parts as company X?" in the literature.
>
> The bad news ids that you will have to compare all the pairs of
> boxes+products in your table to find the duplicate sets. Her is a "cut
> & paste" from my current working draft of SQL FOR SMARTIES third
> edition:
>
> 19.02.04. Todd's Division
>
> A relational division operator proposed by Stephen Todd is
> defined on two tables with common columns that are joined together,
> dropping the JOIN column and retaining only those
> non-JOIN columns that meet a criterion.
>
> We are given a table, JobParts(jobno, partno), and another table,
> SupParts(supno, partno), of suppliers and the parts that they provide.
> We want to get the supplier-and-job pairs such that supplier sn supplies
> all of the parts needed for job jn. This is not quite the same thing as
> getting the supplier-and-job pairs such that job jn requires all of the
> parts provided by supplier sn.
>
> You want to divide the JobParts table by the SupParts table. A rule of
> thumb: The remainder comes from the dividend, but all values in the
> divisor are present.
>
> JobParts SupParts Result = JobSups
> job pno sno pno job sno
> ======== ============= ============
> 'j1' 'p1' 's1' 'p1' 'j1' 's1'
> 'j1' 'p2' 's1' 'p2' 'j1' 's2'
> 'j2' 'p2' 's1' 'p3' 'j2' 's1'
> 'j2' 'p4' 's1' 'p4' 'j2' 's4'
> 'j2' 'p5' 's1' 'p5' 'j3' 's1'
> 'j3' 'p2' 's1' 'p6' 'j3' 's2'
> 's2' 'p1' 'j3' 's3'
> 's2' 'p2' 'j3' 's4'
> 's3' 'p2'
> 's4' 'p2'
> 's4' 'p4'
> 's4' 'p5'
>
> Pierre Mullin submitted the following query to carry out the Todd
> division:
>
> SELECT DISTINCT JP1.job, SP1.supplier
> FROM JobParts AS JP1, SupParts AS SP1
> WHERE NOT EXISTS
> (SELECT *
> FROM JobParts AS JP2
> WHERE JP2.job = JP1.job
> AND JP2.part
> NOT IN (SELECT SP2.part
> FROM SupParts AS SP2
> WHERE SP2.supplier = SP1.supplier));
>
> This is really a modification of the query for Codd's division, extended
> to use a JOIN on both tables in the outermost SELECT statement. The IN
> predicate for the second subquery can be replaced with a NOT EXISTS
> predicate; it might run a bit faster, depending on the optimizer.
>
> Another related query is finding the pairs of suppliers who sell the
> same parts. In this data, that would be the pairs (s1, p2), (s3, p1),
> (s4, p1), (s5, p1)
>
> SELECT S1.sup, S2.sup
> FROM SupParts AS S1, SupParts AS S2
> WHERE S1.sup < S2.sup -- different suppliers
> AND S1.part = S2.part -- same parts
> GROUP BY S1.sup, S2.sup
> HAVING COUNT(*) = (SELECT COUNT (*) -- same count of parts
> FROM SupParts AS S3
> WHERE S3.sup = S1.sup)
> AND COUNT(*) = (SELECT COUNT (*)
> FROM SupParts AS S4
> WHERE S4.sup = S2.sup);

Another formulation of this would be

SELECT SP1.sup AS sup1, SP2.sup AS sup2
FROM (SELECT sup, COUNT(*) AS tally
             FROM SupParts
             GROUP BY sup) AS SP1
            INNER JOIN
            (SELECT sup, COUNT(*) AS tally
             FROM SupParts
             GROUP BY sup) AS SP2
            ON SP1.sup < SP2.sup AND
                   SP1.tally = SP2.tally
            INNER JOIN
            SupParts AS SP3
            ON SP3.sup = SP1.sup
            INNER JOIN
            SupParts AS SP4
            ON SP4.sup = SP2.sup AND
                   SP3.part = SP4.part
GROUP BY SP1.sup, SP1.tally, SP2.sup
HAVING COUNT(*) = SP1.tally;

--
JAG
> This can be modified into Todd's division easily be adding the
> restriction that the parts must also belong to a common job.
>
> Steve Kass came up with a specialized version that depends on using a
> numeric code.  Assume we have a table that tells us which players are on
> which teams.
>
> CREATE TABLE TeamAssignments
> (player_id INTEGER NOT NULL
>      REFERENCES Players(player_id)
>      ON DELETE CASCADE
>      ON UPDATE CASCADE,
>  team_id CHAR(5) NOT NULL
>      REFERENCES Teams(team_id)
>      ON DELETE CASCADE
>      ON UPDATE CASCADE,
>  PRIMARY KEY (player_id, team_id));
>
> To get pairs of Players on the same team:
>
> SELECT P1.player_id, P2.player_id
>   FROM Players AS P1, Players AS P2
> WHERE P1.player_id < P2.player_id
>  GROUP BY P1.player_id, P2.player_id
> HAVING P1.player_id + P2.player_id
>      = ALL (SELECT SUM(P3.player_id)
>               FROM TeamAssignments AS P3
>              WHERE P3.player_id IN (P1.player_id, P2.player_id)
>              GROUP BY P3.team_id);
>
> --CELKO--
>  ===========================
>  Please post DDL, so that people do not have to guess what the keys,
> constraints, Declarative Referential Integrity, datatypes, etc. in your
> schema are.
>
> *** Sent via Developersdex http://www.developersdex.com ***
> Don't just participate in USENET...get rewarded for it!


Relevant Pages

  • Re: object model to table design mapping problem
    ... the INCITS H2 Database Standards Committee(nee ANSI X3H2 ... NOT NULL PRIMARY KEY, ... REFERENCES Courses, ... ON UPDATE CASCADE ...
    (microsoft.public.sqlserver.programming)
  • Re: String Search
    ... Sample data is also a good idea, ... (proj_nbr INTEGER NOT NULL PRIMARY KEY, ... ON UPDATE CASCADE ... REFERENCES Projects ...
    (microsoft.public.sqlserver.programming)
  • Re: Deriving unique rows from historical data
    ... ON UPDATE CASCADE, ... REFERENCES Locations ... start_time DATETIME NOT NULL, ... Google how to code for this schema. ...
    (comp.databases.ms-sqlserver)
  • Extrend SP to check for Expiry date prior to updating?
    ... REFERENCES ( ... ON UPDATE CASCADE, ... set oConn = GetConnection ... set oRS = oCmd.execute ...
    (microsoft.public.inetserver.asp.db)
  • Re: ranking players in a competition
    ... The following query, based on your existing query, should do it: ... FROM YourFirstQuery AS Q2 ... BUT I NEED TO RANK ALL THE PLAYERS IN EACH TEAM. ... IN THE TEAM CLASIFICATION THEREFORE THE PLAYER WHOSE MARK IS HIGHER WILL BE ...
    (microsoft.public.access.queries)