Re: Validating Surrogate Keys

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Dave Wickert [MSFT] (dwickert_at_online.microsoft.com)
Date: 11/09/04


Date: Tue, 9 Nov 2004 15:53:50 -0800

Ref: the importance of partitioning -- two resources.
1) There is an entire chapter in the SQL Server 2000 Resource Kit on this
topic (Chapter 18: Using Partitions in a SQL Server 2000 Data Warehouse) --
also at:
http://msdn.microsoft.com/library/default.asp?url=/library/techart/partitionsindw.htm
2) SSAS Performance Guide:
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ansvcspg.mspx

-- 
Dave Wickert [MSFT]
dwickert@online.microsoft.com
Program Manager
BI SystemsTeam
SQL BI Product Unit (Analysis Services)
--
This posting is provided "AS IS" with no warranties, and confers no rights.
"TP" <thom@pantazi.net> wrote in message
news:7d353aba.0411090339.495ccde7@posting.google.com...
> Dave,
>
> Thanks for the great ideas.  We do not delete from our dimensions so
> the timestamp function would absolutely work.  Having said that, my
> DBA mentioned using partitioned data.  We have never done this, can
> you point me to a good resource describing data partitioning on SQL
> 2000?
>
> Again, I appreciate your input.  One final question that I know I am
> going to be asked is "Is this method an industry standard?"  If it is,
> could you possibly refer to support documentation?  If not, do you
> know if there is a standard method for ensuring referential integrity
> in data warehouses?
>
> Thom
>
> "Dave Wickert [MSFT]" <dwickert@online.microsoft.com> wrote in message
news:<uU038FhxEHA.536@TK2MSFTNGP10.phx.gbl>...
> > Sure. Assuming that have an index built on the dimension tables using
the
> > surrogate keys, it should be fairly straightforward to do an outer join
and
> > look for NULL keys being returned. Since you only really want to know if
it
> > exists in the dimension table, make sure that you only use the surrogate
> > itself -- in which case, it should be an index scan of the dimension
> > table -- which should run quickly.
> >
> > Assuming that you never delete dimension table entries, then you could
have
> > it run even faster if you timestamp the fact table. Once you've scanned
a
> > fact table entry, you never need to go back and check it again (that
would
> > be the best case).
> >
> > Alternatively you could look at partitioning your fact tables. Again,
this
> > would be a mechanism to ensure that you only scan a fact table record
once.
> > If partitioned into a separate table, for example, one table per month,
then
> > you would only need to validate the latest month.
> >
> > Just an idea.
> >
> > -- 
> > Dave Wickert [MSFT]
> > dwickert@online.microsoft.com
> > Program Manager
> > BI SystemsTeam
> > SQL BI Product Unit (Analysis Services)
> > --
> > This posting is provided "AS IS" with no warranties, and confers no
rights.
> >
> > "TP" <thom@pantazi.net> wrote in message
> > news:7d353aba.0411081958.4b3879a7@posting.google.com...
> > > Thanks, Dave.  We aren't using Analysis Services.  We are using Cognos
> > > to build our cubes, it doesn't such a feature.  What do you think of
> > > the script idea?
> > >
> > > "Dave Wickert [MSFT]" <dwickert@online.microsoft.com> wrote in message
> >  news:<ekhnpwexEHA.3376@TK2MSFTNGP12.phx.gbl>...
> > > > Using Analysis Services, we fully expect that RI will not
necessarily be
> > > > in-place with your DW. Thus when we are building cubes from this
data,
> > > > Analysis Services has an optional exception reporting facility where
it
> >  will
> > > > log off to a separate file any records which are processed into
cubes
> >  with
> > > > missing dimension entries. This allows you to collect (and report
on)
> >  which
> > > > facts have missing surrogate keys.
> > > > -- 
> > > > Dave Wickert [MSFT]
> > > > dwickert@online.microsoft.com
> > > > Program Manager
> > > > BI SystemsTeam
> > > > SQL BI Product Unit (Analysis Services)
> > > > --
> > > > This posting is provided "AS IS" with no warranties, and confers no
> >  rights.
> > > >
> > > > "TP" <thom@pantazi.net> wrote in message
> > > > news:7d353aba.0411081318.645fafc5@posting.google.com...
> > > > > We have a small data warehouse (less than 50 million rows in
roughly
> > > > > 75 gigabytes) however, it is growing steadily.  We have noticed
that
> > > > > we get much better performance on queries when there are no
foreign
> > > > > key constraints.  We have demonstrated a 14% improvement.  I am
aware
> > > > > that many data warehouses are designed without foreign key
constraints
> > > > > to improve the load performance of the ETL process.  We have yet
to
> > > > > need that, as a matter of fact until now, we have used a script to
> > > > > build the constraints before the ETL process and drop them after
it.
> > > > >
> > > > > The process of rebuilding the constraints is time consuming.  As
more
> > > > > data is put into the warehouse the time to build the constraints
will
> > > > > continue to become longer.  At some point we anticipate it
impacting
> > > > > our batch window.  It has been my recommendation that we
completely
> > > > > remove the constraints and leave them off.
> > > > >
> > > > > Since our use for the constraints is to check the integrity of the
ETL
> > > > > process, I was wondering if anyone has a methodology for
validating
> > > > > the surrogate keys besides building foreign key constraints.  We
have
> > > > > considered building a series of test scripts that will perform
selects
> > > > > against the data looking for invalid key values.
> > > > >
> > > > > Thanks,
> > > > > Thom


Relevant Pages

  • Re: Question on permissions for base tables of Materialized view and mv logs
    ... PCT not supported with this type of partitioning ... SQL> create materialized view dis_eul.dis_mv ... NOCACHE ... NOPARALLEL ...
    (comp.databases.oracle.server)
  • Re: Validating Surrogate Keys
    ... the timestamp function would absolutely work. ... you point me to a good resource describing data partitioning on SQL ... Assuming that have an index built on the dimension tables using the ... We aren't using Analysis Services. ...
    (microsoft.public.sqlserver.datawarehouse)
  • Re: Critique of Robert C. Martins "Agile Principles, Patterns, and Practices"
    ... since all the SQL is concentrated in one place rather than spread out and duplicated. ... From my point of view it is significantly less since I can forget about the SQL and Schema while dealing with the business rules and interface. ... The partitioning of the code in the Payroll example is just that, ... There are many reasons why such a separation might be beneficial. ...
    (comp.object)
  • Re: Partition???
    ... SQL Server 2000 Resource Kit (if you can't get it from your local ... The SQL RK is one of the resource kits that TechNet ... The SQL RK has an entire chapter on AS partitioning. ... What is Partition in Analysis Service? ...
    (microsoft.public.sqlserver.olap)
  • Re: Critique of Robert C. Martins "Agile Principles, Patterns, and Practices"
    ... wrap the SQL. ... forget about the SQL and Schema while dealing with the business rules ... The partitioning of the code in the Payroll example is just that, ... such a separation might be beneficial. ...
    (comp.object)