Re: Bulk inserting into database using ADO.NET... deadly slow?

Tech-Archive recommends: Speed Up your PC by fixing your registry

From: Pablo Castro [MS] (pablocas_at_online.microsoft.com)
Date: 05/01/04


Date: Sat, 1 May 2004 14:48:55 -0700

Hello Massimo,

It's very unlikely that we'll be able to beat DTS. However, we can get
closer by doing this:

1. Wrap updates in transactions. This will give you a noticeable perf boost.
You mention before that you insert in rounds of ~5K/10K rows. Right before
calling Update() on the adapter, call begin transaction on the connection
object that you're using with the adapter, and set the resulting
SqlTransaction instance to the update/insert/delete commands. Then commit
after the call to Update and call beingtransaction again before the next
call to repeat the cycle.

2. You can also bypass the dataset/adapter all together and use INSERT
statatements with SqlCommand.ExecuteNonQuery. In this case you can
concatenate many insert statements together and wrap each batch in a
transaction. This will give another perf boost because you'll reduce the
round-trips in several orders of magnitude.

FWIW, in the next version of ADO.NET we'll have bulk-insert APIs, so you can
expect huge perf improvements in this area :)

-- 
Pablo Castro
Program Manager - ADO.NET Team
Microsoft Corp.
This posting is provided "AS IS" with no warranties, and confers no rights.
"Massimo" <barone@mclink.it> wrote in message
news:OCrvPUyLEHA.3712@TK2MSFTNGP11.phx.gbl...
> I have this SQL Server 2000 database, which I want to populate with data
> generated from a program; the schema is quite simple (one table, seven
> columns, each one one byte wide), but there are plenty of data: about 450
> million rows, for an estimated total of ~3 GB.
> So, I wrote my data-generating .NET application, and then tried to put
data
> into a DataSet and then storing it into the DB using a SqlDataAdapter...
and
> it was painful. My first guess was to store as much data as possible into
> the DataSet, so to reduce the DB updates, but I found that when reaching
> about 100K rows the DataSet becomes slower and slower, each insertion
taking
> more time than the preceding ones. So i resorted to update the DB more
> often, and I tried doing it every 20000, every 10000 and every 5000 rows
> added, but it always seemed to take a lot of time (i.e. more than one
minute
> to insert 20K rows).
> Finally, I decided to avoid DataSets, and I simply wrote all of the
program
> output to a 6 GB CSV text file, and imported it into the DB using DTS;
and,
> wonderful... it started inserting 40-50K rows *per second*; after about
five
> hours the job was completed.
> So, here's the question: why is it so painfully slow to insert big chunks
of
> data into a DB using DataSets? It's so slow that even a 6 GB text file
> provides a *much* better throughput...
>
> Massimo
>


Relevant Pages

  • RE: DTS Transaction handling in a multi phase data pump
    ... Distributed Transaction Coordinator or alternatively writing a stored ... > DTS Problem definition: ... > than 10,000 records) to the Staging table (say, TableA). ... > The Multi-phase data pump approach was used to achieve this purpose. ...
    (microsoft.public.sqlserver.dts)
  • DTS Error: Unable to enlist in the transaction
    ... I'm looking to use DTS to delete and reload some volatile data from ... Running the DTS package designer locally, ... - I've tried this with both Windows Authentication and SQL Server ... select "Join Transaction If Present" and "Rollback ...
    (microsoft.public.sqlserver.dts)
  • Transaction Log File Full
    ... I am looking for advise on how to do this without the transaction ... I am using DTS, as this allows me to view the row count of records ... Throughout the time DTS was executing, I monitored the size of the ... What is the best way for me to load this data, without it being logged, ...
    (microsoft.public.sqlserver.server)
  • Re: Transaction Log File Full
    ... Only way I could say is commit the transaction in batches, ... BCP IN from the text file with '-b' option. ... I use the following SQL as the DTS source: ...
    (microsoft.public.sqlserver.server)