Re: Bulk inserting into database using ADO.NET... deadly slow?
From: Pablo Castro [MS] (pablocas_at_online.microsoft.com)
Date: 05/01/04
- Next message: David Jessee: "Re: Using GUID versus Autoincrement Values"
- Previous message: Massimo: "Re: Using GUID versus Autoincrement Values"
- In reply to: Massimo: "Bulk inserting into database using ADO.NET... deadly slow?"
- Next in thread: Massimo: "Re: Bulk inserting into database using ADO.NET... deadly slow?"
- Reply: Massimo: "Re: Bulk inserting into database using ADO.NET... deadly slow?"
- Messages sorted by: [ date ] [ thread ]
Date: Sat, 1 May 2004 14:48:55 -0700
Hello Massimo,
It's very unlikely that we'll be able to beat DTS. However, we can get
closer by doing this:
1. Wrap updates in transactions. This will give you a noticeable perf boost.
You mention before that you insert in rounds of ~5K/10K rows. Right before
calling Update() on the adapter, call begin transaction on the connection
object that you're using with the adapter, and set the resulting
SqlTransaction instance to the update/insert/delete commands. Then commit
after the call to Update and call beingtransaction again before the next
call to repeat the cycle.
2. You can also bypass the dataset/adapter all together and use INSERT
statatements with SqlCommand.ExecuteNonQuery. In this case you can
concatenate many insert statements together and wrap each batch in a
transaction. This will give another perf boost because you'll reduce the
round-trips in several orders of magnitude.
FWIW, in the next version of ADO.NET we'll have bulk-insert APIs, so you can
expect huge perf improvements in this area :)
-- Pablo Castro Program Manager - ADO.NET Team Microsoft Corp. This posting is provided "AS IS" with no warranties, and confers no rights. "Massimo" <barone@mclink.it> wrote in message news:OCrvPUyLEHA.3712@TK2MSFTNGP11.phx.gbl... > I have this SQL Server 2000 database, which I want to populate with data > generated from a program; the schema is quite simple (one table, seven > columns, each one one byte wide), but there are plenty of data: about 450 > million rows, for an estimated total of ~3 GB. > So, I wrote my data-generating .NET application, and then tried to put data > into a DataSet and then storing it into the DB using a SqlDataAdapter... and > it was painful. My first guess was to store as much data as possible into > the DataSet, so to reduce the DB updates, but I found that when reaching > about 100K rows the DataSet becomes slower and slower, each insertion taking > more time than the preceding ones. So i resorted to update the DB more > often, and I tried doing it every 20000, every 10000 and every 5000 rows > added, but it always seemed to take a lot of time (i.e. more than one minute > to insert 20K rows). > Finally, I decided to avoid DataSets, and I simply wrote all of the program > output to a 6 GB CSV text file, and imported it into the DB using DTS; and, > wonderful... it started inserting 40-50K rows *per second*; after about five > hours the job was completed. > So, here's the question: why is it so painfully slow to insert big chunks of > data into a DB using DataSets? It's so slow that even a 6 GB text file > provides a *much* better throughput... > > Massimo >
- Next message: David Jessee: "Re: Using GUID versus Autoincrement Values"
- Previous message: Massimo: "Re: Using GUID versus Autoincrement Values"
- In reply to: Massimo: "Bulk inserting into database using ADO.NET... deadly slow?"
- Next in thread: Massimo: "Re: Bulk inserting into database using ADO.NET... deadly slow?"
- Reply: Massimo: "Re: Bulk inserting into database using ADO.NET... deadly slow?"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|