Re: Question on choosing primary key; real or arbitrary

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance

From: Andrew John (aj_at_DELETEmistrose.com)
Date: 02/25/04


Date: Thu, 26 Feb 2004 04:45:11 +1100

Eric,

I can think of 2 reasons to introduce a surrogate key into a datawarehouse
1) Storage is really tight and you have a lot of foreign keys
2) Your clustered index does not grow at the end of the table, and the inserts
    as a result are too slow and fragmented.

I don't think either of those reasons apply here. I would make ROW_STAMP
( you do like capitals don't you ? ) the first column in the clustered index, so that
the table grows on the end. Especially as you say it's always in the where clause.

Regards
 AJ

"Eric Sabine" <mopar41@____h_o_t_m_a_i_l_._ScPoAmM> wrote in message news:%23QOyY17%23DHA.2860@tk2msftngp13.phx.gbl...
> I have a database that will be used to "warehouse" (not a big DW, rather a
> single table DW) data on inventory. Each night after midnight, data will be
> loaded into the table on the prior day's ending data. My question is really
> about choosing the better primary key. While a PK can be created with 4 of
> the table's columns, it is the size of the PK which I wonder about. If I
> choose these 4 columns, my PK byte size is 38 bytes. Rather if I add an
> arbitrary INT IDENTITY to the table, my PK size becomes 4 bytes which
> translates to more rows per page for my non-clustered indexes. Note the
> table would then receive a second unique index thus there would be 2. Below
> is the schema and some assumptions about the uniqueness of each column. I'm
> simply looking for comments about choosing which schema for choosing the PK
> is better. As stated, this is for warehousing. This table will only be
> used for (a) uploading about 25k rows per day and (b) reporting (always with
> ROW_STAMP in the WHERE CLAUSE both as a range and an equals).
>
> Thanks,
> Eric
>
>
> IF OBJECT_ID('INVENTORY_DATA1') IS NOT NULL
> DROP TABLE INVENTORY_DATA1
> GO
> USE ADW
> GO
> CREATE TABLE dbo.INVENTORY_DATA1 (
> --PKID INT IDENTITY NOT NULL, -- total possibilities = 21,000 to 25,000
> per day over 15 years; up to 136,875,000
> DB VARCHAR(6) NOT NULL, -- total possibilities = 7 to 8
> PART_CODE VARCHAR(20) NOT NULL, -- total possibilities = 15,000 to 20,000
> WGH_AVG_COST DECIMAL(19,6) NOT NULL, -- total possibilities = infinite
> EXIST_GL_ACC CHAR(8) NOT NULL, -- total possibilities = 20 to 30
> LOC_CODE VARCHAR(10) NOT NULL, -- total possibilities = under 10
> QTY_ONH DECIMAL(19,3) NOT NULL, -- total possibilities = infinite
> ROW_STAMP SMALLDATETIME NOT NULL -- total possibilities = 365 per year
> over 15 years; 5475
> CONSTRAINT DF_INVENTORY_DATA_ROW_STAMP1
> DEFAULT(CONVERT(SMALLDATETIME,CONVERT(VARCHAR,CURRENT_TIMESTAMP - 1,112)))
> )
>
> GO
>
> -- primary key A (used if PKID doesn't exist)
> ALTER TABLE dbo.INVENTORY_DATA1
> ADD CONSTRAINT PK_INVENTORY_DATA1
> PRIMARY KEY NONCLUSTERED (DB, PART_CODE, LOC_CODE, ROW_STAMP) --6 + 20 + 10
> + 2 = 38 bytes
> GO
>
> -- primary key B (used if PKID does exist)
> ALTER TABLE dbo.INVENTORY_DATA1
> ADD CONSTRAINT PK_INVENTORY_DATA1_PKID PRIMARY KEY NONCLUSTERED (PKID) --
> 4 bytes
> -- An additional unique index if primary key B is us
> CREATE UNIQUE INDEX UI_INVENTORY_DATA1 ON dbo.INVENTORY_DATA1 (DB,
> PART_CODE, LOC_CODE, ROW_STAMP)
>
> -- The clustered index
> CREATE CLUSTERED INDEX CLI_INVENTORY_DATA1 ON dbo.INVENTORY_DATA1 (DB,
> ROW_STAMP, PART_CODE) -- 6 + 2 + 20 = 28 bytes
>
>
> -- Other indexes for certain queries
> CREATE INDEX NCI_INVENTORY_DATA1_PART_DATA1 ON dbo.INVENTORY_DATA1
> (PART_CODE, WGH_AVG_COST)
> CREATE INDEX NCI_INVENTORY_DATA1_PART_DATA2 ON dbo.INVENTORY_DATA1
> (PART_CODE, LOC_CODE, QTY_ONH)
> GO
> -- Indexed View for a certain report
> CREATE VIEW INVENTORY_DETAIL WITH SCHEMABINDING
> AS
> SELECT ROW_STAMP, EXIST_GL_ACC, SUM(WGH_AVG_COST * QTY_ONH) AS DETAIL,
> COUNT_BIG(*) AS CNT
> FROM dbo.INVENTORY_DATA1
> GROUP BY ROW_STAMP, EXIST_GL_ACC
> GO
>
> CREATE UNIQUE CLUSTERED INDEX CLI_INVENTORY_DETAIL ON INVENTORY_DETAIL
> (ROW_STAMP, EXIST_GL_ACC)
> GO
>
> DROP VIEW INVENTORY_DETAIL
> DROP TABLE INVENTORY_DATA1
>
>



Relevant Pages

  • Re: Relational-to-OOP Tax
    ... well-designed system. ... In what way would that affect the other old applications? ... "Schemas change for different reasons ... Coupling the application to the schema makes ...
    (comp.object)
  • Re: Critique of Robert C. Martins "Agile Principles, Patterns, and Practices"
    ... "The database schema and the application logic change for different ... isn't precisely because your code and schema are too tightly coupled. ... solution won't permit it) or a design flaw. ... for different reasons" instead of word it as a fact. ...
    (comp.object)
  • Re: Relational-to-OOP Tax
    ... remains that the schema and applicaton code change for different ... different reasons than application code". ... Multiple applications may use the same ... A change in the schema to support application ...
    (comp.object)
  • Re: Critique of Robert C. Martins "Agile Principles, Patterns, and Practices"
    ... due to upgrades to the COTS system that owns ... they do most of the app changes, ... You conveniently ignored the other two examples of schema change ... application change for different reasons is unfounded. ...
    (comp.object)
  • Re: OT: Damsel, Ophelia, I Need a Mom
    ... Tell her flat out "This behaviour is no longer acceptible to ... cutting off a family member almost always has ... Again, if it works for you, fine, but there are good reasons why it ... they are *choosing* for whatever reasons, ...
    (rec.food.cooking)