Re: do all primary keys use autonumber
- From: Vincent Johns <vjohns@xxxxxxxxxxxxxxxxxx>
- Date: Sat, 19 Nov 2005 05:26:57 GMT
I ran the test you suggested. Perhaps I missed something (details below), but I wasn't impressed with the results I got. Please let me know if I did miss something here.
peregenem@xxxxxxxxxxxx wrote:
Vincent Johns wrote:
How does knowing the "physical order of the disk" (assuming you can determine what that is) help you do ordinary database operations?
Physical order of the disk (a.k.a the cluster index) for a table is determined by its PRIMARY KEY designation.
Here's a quick demo of why for Jet you must choose a PRIMARY KEY carefully.
I assume you have a 'sequence' table i.e. a table of integers e.g.
CREATE TABLE [Sequence] (seq INTEGER) ; INSERT INTO [Sequence] VALUES (1) ; INSERT INTO [Sequence] VALUES (2) ; INSERT INTO [Sequence] VALUES (3) ; etc etc.
In reality, my data was created in Excel and imported <g>.
You didn't say how many records you created. What I did was to put 100,000 records into [Sequence], with values of 1 to 100,000.
Now create two test tables:
CREATE TABLE Test1 ( key_col INTEGER IDENTITY(1, 1) NOT NULL, data_col INTEGER NOT NULL, PRIMARY KEY (key_col) ) ; CREATE TABLE Test2 ( key_col INTEGER IDENTITY(1, 1) NOT NULL, data_col INTEGER NOT NULL, PRIMARY KEY (data_col, key_col) ) ;
I set [Test1].[key_col] and [Test2].[key_col] to be incremental Autonumber (long integer) fields, and [Test1].[data_col] and [Test2].[data_col] to be Number (long integer) fields. For [Test1], I set the primary key field to be [key_col], and for [Test2] I set the primary key to include both fields.
Note the difference in PK designation between them.
Now load some data using the 'sequence table':
INSERT INTO Test1 (data_col) SELECT DT1.data_col FROM ( SELECT S1.seq MOD 100 AS data_col FROM [Sequence] AS S1 WHERE S1.seq <= 50000 UNION ALL SELECT S1.seq MOD 100 AS data_col FROM [Sequence] AS S1 WHERE S1.seq <= 50000 UNION ALL SELECT S1.seq MOD 100 AS data_col FROM [Sequence] AS S1 WHERE S1.seq <= 50000 UNION ALL SELECT S1.seq MOD 100 AS data_col FROM [Sequence] AS S1 WHERE S1.seq <= 50000 UNION ALL SELECT S1.seq MOD 100 AS data_col FROM [Sequence] AS S1 WHERE S1.seq <= 50000 ) AS DT1 ; INSERT INTO Test1 (data_col) SELECT DT1... (as before) ;
For the second version, I assume you really meant
INSERT INTO Test2 (data_col)
.. Your SQL didn't work on my Access 2000, so what I actually did was to define the following Queries:
[Q_ShowSeq] SQL: SELECT S1.seq MOD 100 AS data_col FROM Sequence AS S1 WHERE S1.seq<=50000;
[Q_UnionAll] SQL: SELECT * FROM Q_ShowSeq UNION ALL SELECT * FROM Q_ShowSeq UNION ALL SELECT * FROM Q_ShowSeq UNION ALL SELECT * FROM Q_ShowSeq UNION ALL SELECT * FROM Q_ShowSeq;
[Q_PopulateTest1] SQL: INSERT INTO Test1 ( data_col ) SELECT DT1.data_col FROM [SELECT * FROM Q_UnionAll ]. AS DT1;
and similarly for [Q_PopulateTest2].
This put 250,000 records into each of [Test1] and [Test2].
At this point, compact the file. This has the effect of physically rebuilding the tables based on their PRIMARY KEY designations.
To test the relative performance of the PRIMARY KEY choices, run a query which uses a BETWEEN construct because this favours the physical order:
SELECT key_col, data_col FROM Test1 WHERE data_col BETWEEN 10 AND 15 OR data_col BETWEEN 30 AND 35 OR data_col BETWEEN 50 AND 55 OR data_col BETWEEN 70 AND 75 OR data_col BETWEEN 90 AND 95 ; SELECT key_col, data_col FROM Test2 WHERE data_col BETWEEN 10 AND 15 OR data_col BETWEEN 30 AND 35 OR data_col BETWEEN 50 AND 55 OR data_col BETWEEN 70 AND 75 OR data_col BETWEEN 90 AND 95
In my test timings, table with PRIMARY KEY (key_col) is about 10 times slower than the table with PRIMARY KEY (data_col, key_col).
My version of your Queries (essentially identical to yours) was
[Q_SelectTest1] SQL: SELECT [key_col], [data_col] FROM Test1 WHERE data_col Between 10 And 15 Or data_col Between 30 And 35 Or data_col Between 50 And 55 Or data_col Between 70 And 75 Or data_col Between 90 And 95 ;
and similarly for [Q_SelectTest2].
In my test timings, after compacting the database, both versions completed essentially instantly (for example, 1.23E-08) as measured by the Time function (system time, = days since midnight), so both were apparently completed in a few milliseconds or less.
But looking at your description, even assuming I had included enough records to be able to notice a difference, it appears that you forgot to INDEX the [Test1].[data_col] field! Declaring the primary key to include only [Test1].[key_col], omitting the field on which you were filtering the records, would also omit putting an index onto the other field. I imagine that that's why the Query took longer when you ran it.
This is exactly what I was referring to earlier, specifically the problem of FORGETTING to index a field that one needs to use for sorting or filtering a recordset.
Choosing an index that fails to match the physical record order on disk pales in comparison to this type of omission.
Bear in mind that putting lots of indices onto a Table may not improve performance. Indices improve performance when searching and sorting, but they reduce performance with updating (since each index must be updated when a record is changed).
In summary, although it's perhaps useful to know that a primary key gets slightly special treatment from Access, the improvement in performance that you might experience is likely to be pretty marginal. Far more important, IMHO, is organizing the information (via primary keys; indices; helpfully named Tables, fields, and Queries; properly human-engineered Forms; etc.) in your database so that it will do the job that you need done and will be easy to maintain. Worrying too much about internal implementation or "efficiency" can be counterproductive, if it distracts you from the imporant parts of the design. (Worse than any of this, of course, is failing to make the database model the real-world system it's supposed to model, but that's another story.)
-- Vincent Johns <vjohns@xxxxxxxxxxxxxxxxxx> Please feel free to quote anything I say here. .
- References:
- Re: do all primary keys use autonumber
- From: Vincent Johns
- Re: do all primary keys use autonumber
- From: peregenem
- Re: do all primary keys use autonumber
- From: Vincent Johns
- Re: do all primary keys use autonumber
- From: peregenem
- Re: do all primary keys use autonumber
- Prev by Date: Re: Finding ACCESS Group and/or consultant
- Next by Date: Re: do all primary keys use autonumber
- Previous by thread: Re: do all primary keys use autonumber
- Next by thread: Re: do all primary keys use autonumber
- Index(es):
Relevant Pages
|