Re: Generating one table with a terabyte of data



On fast bulk loads you should make sure that the database is not logging.
If you keep track of an identity or some key field you can undo the load if
necessary. Another thing is to split the updates across file sets. Each
file can get it's own thread where, I believe, only one thread can be
release per file for IO. This can be done by portioning the data.

You are running up against the classic issues of Data Warehousing. Fast
loading of daily data in the fastest time. The rules for a data warehouse
fir perfectly here. No transactions and many small file sets to allow
parallel loading of a table.

The more memory you can add to the box the better. More processors - at
least two. Use /3GB to give SQLServer a larger address space but only if
you have 4GB of memory or more on the Datacenter Editions.

The slowness of the UI says that the database is working. On SQS this is
what should happen. The UI is low priority on a server.

You failed to say which OS and hardware specs. This is very important to
tuning.
Consider that you can set the database and OS up for loading and then reboot
with an operational configuration.

If you use multiple files for partitioned data be sure the files sets are on
different drives. The drives should NOT be raided for maximum performance.
If the bulk load is fast enough you can always reload (like we do in a DW)
instead of doing backups. The raw data is archived to a fast tape and a
loader can reload it faster than you can restore a database from tape.

In the past I have split the databases up. Fast Loader.Read only
datawarehouse tables and transactional tables in different databases. The
read-only DW databases are not backed up but the others are.

On a 3 Tb DW we could loaded close to over 1.5 G records per day with an
average record greater than 700 bytes. The loading had to be complete by 7
AM and did not start until after 2 AM. 4 1.2Ghz (64bit) Processors and a
RAID farm. We didn't have to tear down the RAID 5 farm because we had
hundreds of file sets and could optimize parallel loading. We also had 8
Channels of Raid and 4Gb of memory. Building this beast we learned a lot
about optimizing loading.

I applied the lessons to a SQLServer 7.0 instance on a 4 Processor Proliant
with 4Gb and one RAID channel. We were able to blast a few hundred million
records in daily within a four hour window something everyone said would be
impossible. Like you we needed to do repeated passes on these records to
find subtle matches in the data. (Data Mining). Applying the DW rules
worked for this application.

--
Jim Vierra

"Kevin3NF" <KHill@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:%23i42snMVFHA.628@xxxxxxxxxxxxxxxxxxxxxxx
> I've locked SQL server to 1/3 physical memory usage. DB growth is 512Meg
> chunks
>
> You need to free it up to fly, and pre-allocate the db size. In-process
> file growth will kill performance
>
> --
> Kevin Hill
> President
> 3NF Consulting
>
> www.3nf-inc.com/NewsGroups.htm
>
> www.DallasDBAs.com/forum - new DB forum for Dallas/Ft. Worth area DBAs.
>
> www.experts-exchange.com - experts compete for points to answer your
> questions
>
>
> "lee" <lee@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
> news:B772A8C2-F009-4E29-816C-5B6C2F45151E@xxxxxxxxxxxxxxxx
>> Hi All,
>>
>> I need ideas on how to improve a process that can generate a terabyte of
>> data in a week. Our company produces software that analyzes code and one
>> of
>> the things we do is Code Duplication. We're processing hundreds of
>> millions
>> of lines against each other and generate gigabytes of results. In reality
>> we
>> generate over 20 gigabytes of data a day. We are noticing major slow
>> downs in
>> the system. Here's how we're going about it
>>
>> 1. Non-Indexed table residing on it's own FileGroup (which is not on the
>> same drive as the SQL server).
>> 2. A process that generates Text files that we use BULK INSERT to load
>> the
>> data (any other way of doing inserts is just to slow).
>>
>> This process should generate 10 million rows in the table per hour. Only
>> problem is that we're noticing a considerable slow down as we pummel the
>> Database. We're making sure our Bulk Insert has exclusive table locks and
>> nobody else accesses the system while this is happening. We're also
>> noticing
>> a strange behavior at the OS level. The system just slows down to a
>> crawl.
>> Even mouse movement becomes jagged (Win2K Server).
>>
>> I should also mention that once we're done inserting we create indexes at
>> the end. Average time to create 1 indexed field (on a different
>> filegroup) on
>> 100 million rows is about 3 to 4 hours. Seems a little excessive to me,
>> must
>> be doing something wrong.
>>
>> I've gone over this 100 times and I can't see anything else to improve
>> performance. I have a friend that is an ORACLE DBA and he keeps telling
>> me
>> that SQL server just can't handle this type of volume and that I should
>> move
>> to ORACLE. I refuse to believe ORACLE can do more then SQL Server. He
>> actually took one of our 500 meg text files and loaded it in 1/10 the
>> time it
>> took SQL Server. My answer to that is "I must be configuring it
>> incorrectly".
>>
>> I've locked SQL server to 1/3 physical memory usage. DB growth is 512Meg
>> chunks. We don't need any transaction logs, so we're using RECOVERY MODEL
>> =
>> SIMPLE (any other mode kills SQL server anyway, LOG fills to fast with
>> our
>> data creation speed). I've thrown everything at this except the kitchen
>> sink.
>>
>> Any suggestions on how I can improve my Bulk Inserts and how to improve
>> my
>> Index creations? Should I consider SQL Server 2005? Will it handle my
>> needs
>> better?
>>
>
>


.



Relevant Pages

  • Re: Generating one table with a terabyte of data
    ... Run perfmon counters to a log and analyze them after a bulk load. ... > On fast bulk loads you should make sure that the database is not logging. ... >> I've locked SQL server to 1/3 physical memory usage. ...
    (microsoft.public.sqlserver.dts)
  • Re: Bulk Copy from datasource A to B
    ... The same can be done for Oracle. ... optimizations will have to be per database, ... >> to create an UpdateGram or DiffGram and put it into SQL Server with one ... >> In other words, loop and INSERT for MySql, but bulk for SQL ...
    (microsoft.public.dotnet.framework.adonet)
  • EXCEPTION ACCESS VIOLATION
    ... I'm using VS2008 vb.net for a windows forms application, with SQL Server ... The bulk copy target is a database on the local computer. ... dim bcp as new SQLClient.SQLBulkCopy..... ...
    (microsoft.public.dotnet.languages.vb)
  • Re: ADO, ADO.NET v. DTS
    ... load large amounts of data into SQL Server. ... These include the BULK INSERT ... > What is it about a DTS that allows you to move a table from one server ... assume the database is in simple recovery mode, ...
    (microsoft.public.sqlserver.server)
  • Re: MS Access DAO -> ADO.NET Migration
    ... William Vaughn ... Microsoft MVP ... Hitchhiker's Guide to Visual Studio and SQL Server ... My migration app works building a SSCE database file with imported data ...
    (microsoft.public.dotnet.framework.adonet)