Re: Question on choosing primary key; real or arbitrary
From: Andrew John (aj_at_DELETEmistrose.com)
Date: 02/25/04
- Next message: Aaron Bertrand [MVP]: "Re: Table Contents Last Changed"
- Previous message: Aaron Bertrand [MVP]: "Re: how to get the table schema?"
- In reply to: Eric Sabine: "Question on choosing primary key; real or arbitrary"
- Next in thread: --CELKO--: "Re: Question on choosing primary key; real or arbitrary"
- Messages sorted by: [ date ] [ thread ]
Date: Thu, 26 Feb 2004 04:45:11 +1100
Eric,
I can think of 2 reasons to introduce a surrogate key into a datawarehouse
1) Storage is really tight and you have a lot of foreign keys
2) Your clustered index does not grow at the end of the table, and the inserts
as a result are too slow and fragmented.
I don't think either of those reasons apply here. I would make ROW_STAMP
( you do like capitals don't you ? ) the first column in the clustered index, so that
the table grows on the end. Especially as you say it's always in the where clause.
Regards
AJ
"Eric Sabine" <mopar41@____h_o_t_m_a_i_l_._ScPoAmM> wrote in message news:%23QOyY17%23DHA.2860@tk2msftngp13.phx.gbl...
> I have a database that will be used to "warehouse" (not a big DW, rather a
> single table DW) data on inventory. Each night after midnight, data will be
> loaded into the table on the prior day's ending data. My question is really
> about choosing the better primary key. While a PK can be created with 4 of
> the table's columns, it is the size of the PK which I wonder about. If I
> choose these 4 columns, my PK byte size is 38 bytes. Rather if I add an
> arbitrary INT IDENTITY to the table, my PK size becomes 4 bytes which
> translates to more rows per page for my non-clustered indexes. Note the
> table would then receive a second unique index thus there would be 2. Below
> is the schema and some assumptions about the uniqueness of each column. I'm
> simply looking for comments about choosing which schema for choosing the PK
> is better. As stated, this is for warehousing. This table will only be
> used for (a) uploading about 25k rows per day and (b) reporting (always with
> ROW_STAMP in the WHERE CLAUSE both as a range and an equals).
>
> Thanks,
> Eric
>
>
> IF OBJECT_ID('INVENTORY_DATA1') IS NOT NULL
> DROP TABLE INVENTORY_DATA1
> GO
> USE ADW
> GO
> CREATE TABLE dbo.INVENTORY_DATA1 (
> --PKID INT IDENTITY NOT NULL, -- total possibilities = 21,000 to 25,000
> per day over 15 years; up to 136,875,000
> DB VARCHAR(6) NOT NULL, -- total possibilities = 7 to 8
> PART_CODE VARCHAR(20) NOT NULL, -- total possibilities = 15,000 to 20,000
> WGH_AVG_COST DECIMAL(19,6) NOT NULL, -- total possibilities = infinite
> EXIST_GL_ACC CHAR(8) NOT NULL, -- total possibilities = 20 to 30
> LOC_CODE VARCHAR(10) NOT NULL, -- total possibilities = under 10
> QTY_ONH DECIMAL(19,3) NOT NULL, -- total possibilities = infinite
> ROW_STAMP SMALLDATETIME NOT NULL -- total possibilities = 365 per year
> over 15 years; 5475
> CONSTRAINT DF_INVENTORY_DATA_ROW_STAMP1
> DEFAULT(CONVERT(SMALLDATETIME,CONVERT(VARCHAR,CURRENT_TIMESTAMP - 1,112)))
> )
>
> GO
>
> -- primary key A (used if PKID doesn't exist)
> ALTER TABLE dbo.INVENTORY_DATA1
> ADD CONSTRAINT PK_INVENTORY_DATA1
> PRIMARY KEY NONCLUSTERED (DB, PART_CODE, LOC_CODE, ROW_STAMP) --6 + 20 + 10
> + 2 = 38 bytes
> GO
>
> -- primary key B (used if PKID does exist)
> ALTER TABLE dbo.INVENTORY_DATA1
> ADD CONSTRAINT PK_INVENTORY_DATA1_PKID PRIMARY KEY NONCLUSTERED (PKID) --
> 4 bytes
> -- An additional unique index if primary key B is us
> CREATE UNIQUE INDEX UI_INVENTORY_DATA1 ON dbo.INVENTORY_DATA1 (DB,
> PART_CODE, LOC_CODE, ROW_STAMP)
>
> -- The clustered index
> CREATE CLUSTERED INDEX CLI_INVENTORY_DATA1 ON dbo.INVENTORY_DATA1 (DB,
> ROW_STAMP, PART_CODE) -- 6 + 2 + 20 = 28 bytes
>
>
> -- Other indexes for certain queries
> CREATE INDEX NCI_INVENTORY_DATA1_PART_DATA1 ON dbo.INVENTORY_DATA1
> (PART_CODE, WGH_AVG_COST)
> CREATE INDEX NCI_INVENTORY_DATA1_PART_DATA2 ON dbo.INVENTORY_DATA1
> (PART_CODE, LOC_CODE, QTY_ONH)
> GO
> -- Indexed View for a certain report
> CREATE VIEW INVENTORY_DETAIL WITH SCHEMABINDING
> AS
> SELECT ROW_STAMP, EXIST_GL_ACC, SUM(WGH_AVG_COST * QTY_ONH) AS DETAIL,
> COUNT_BIG(*) AS CNT
> FROM dbo.INVENTORY_DATA1
> GROUP BY ROW_STAMP, EXIST_GL_ACC
> GO
>
> CREATE UNIQUE CLUSTERED INDEX CLI_INVENTORY_DETAIL ON INVENTORY_DETAIL
> (ROW_STAMP, EXIST_GL_ACC)
> GO
>
> DROP VIEW INVENTORY_DETAIL
> DROP TABLE INVENTORY_DATA1
>
>
- Next message: Aaron Bertrand [MVP]: "Re: Table Contents Last Changed"
- Previous message: Aaron Bertrand [MVP]: "Re: how to get the table schema?"
- In reply to: Eric Sabine: "Question on choosing primary key; real or arbitrary"
- Next in thread: --CELKO--: "Re: Question on choosing primary key; real or arbitrary"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|