Generating one table with a terabyte of data



Hi All,

I need ideas on how to improve a process that can generate a terabyte of
data in a week. Our company produces software that analyzes code and one of
the things we do is Code Duplication. We're processing hundreds of millions
of lines against each other and generate gigabytes of results. In reality we
generate over 20 gigabytes of data a day. We are noticing major slow downs in
the system. Here's how we're going about it

1. Non-Indexed table residing on it's own FileGroup (which is not on the
same drive as the SQL server).
2. A process that generates Text files that we use BULK INSERT to load the
data (any other way of doing inserts is just to slow).

This process should generate 10 million rows in the table per hour. Only
problem is that we're noticing a considerable slow down as we pummel the
Database. We're making sure our Bulk Insert has exclusive table locks and
nobody else accesses the system while this is happening. We're also noticing
a strange behavior at the OS level. The system just slows down to a crawl.
Even mouse movement becomes jagged (Win2K Server).

I should also mention that once we're done inserting we create indexes at
the end. Average time to create 1 indexed field (on a different filegroup) on
100 million rows is about 3 to 4 hours. Seems a little excessive to me, must
be doing something wrong.

I've gone over this 100 times and I can't see anything else to improve
performance. I have a friend that is an ORACLE DBA and he keeps telling me
that SQL server just can't handle this type of volume and that I should move
to ORACLE. I refuse to believe ORACLE can do more then SQL Server. He
actually took one of our 500 meg text files and loaded it in 1/10 the time it
took SQL Server. My answer to that is "I must be configuring it incorrectly".

I've locked SQL server to 1/3 physical memory usage. DB growth is 512Meg
chunks. We don't need any transaction logs, so we're using RECOVERY MODEL =
SIMPLE (any other mode kills SQL server anyway, LOG fills to fast with our
data creation speed). I've thrown everything at this except the kitchen sink.

Any suggestions on how I can improve my Bulk Inserts and how to improve my
Index creations? Should I consider SQL Server 2005? Will it handle my needs
better?

.



Relevant Pages

  • Strange behaviour with SQLBulkOperations
    ... I'm trying to implement bulk inserts via ODBC. ... I then tried it with Oracle Express, ... // Set the cursor type. ... Shouldn't SQL Server and Oracle support bulk operations? ...
    (microsoft.public.data.odbc)
  • Re: Generating one table with a terabyte of data
    ... Pretty much disabled the transaction log completely. ... We are using BULK INSERT ... for all those who don't know, SQL server prefers smaller BULK ... ORACLE seems to handle that 1 big file really well but with small files SQL ...
    (microsoft.public.sqlserver.dts)
  • Re: Bulk Copy from datasource A to B
    ... The same can be done for Oracle. ... optimizations will have to be per database, ... >> to create an UpdateGram or DiffGram and put it into SQL Server with one ... >> In other words, loop and INSERT for MySql, but bulk for SQL ...
    (microsoft.public.dotnet.framework.adonet)
  • Re: IOT, memory and transaction time
    ... easily generate it with DBMS_METADATA.GET_DDL if your verison of Oracle ... were pretty consistent no matter how big the table (this transaction ... FOREIGN KEY (versionNo) REFERENCES T_TRANSACTIONS, ... Oracle provides read consistency and SQL Server ...
    (comp.databases.oracle.misc)
  • Re: Database or store to handle 30 Mb/sec and 40,000 inserts/sec
    ... "Writers block readers and readers block writers in SQLServer. ... getting around this fundamental issue and because of it SQL Server ... admitted they can happen in Oracle, I guess Oracle should never been used ... isolation level but you will always disagree because you follow doctorine ...
    (comp.databases.oracle.server)

Loading