Re: Processing large files with TextFieldParser

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



"Jon Spivey" <js@xxxxxxxxxxxxxxxxx> wrote in
news:u#Orc$dcKHA.1028@xxxxxxxxxxxxxxxxxxxx:

I've got a large (1.1m rows) csv file which needs parsing and sticking
into sql server - the job needs doing every day and if anything the
csv will get larger over time. At present I'm using a TextFieldParser
to parse the csv line by line and add to the database. This fails
probably 2 times in 3, if it's going to fall over it's usually at
around 200,000 lines. Looking for suggestions as to how to do this
robustly, on a shared server which doesn't allow bulk insert. Fair to
assume the server is a factor in failure but I can't upgrade just yet.

Having done this numerous times, I find the best way is to use a
StreamReader and read in line by line, esp. with large files, as trying
to store everything in memory (whether DataSet or objects) ends up
unwieldy.

With a good regex, you can divide out the elements, even if there is a
text delimiter (usually some form of quote). I have written my own, but
I would not be surprised if there are others.

Another direction to conquer this, as you are storing in SQL Server, is
to use SSIS (or DTS in older versions). SSIS has the ability to read a
CSV file.

If this is a file format you can set up a BCP file for, you can bulk
load the items into SQL Server, as well. NOTE that this will not work if
you have to manipulate the CSV flat file into multiple tables, however.

In the past, I architected a system that had multiple GB files that had
to be manipulated. The solution was to leave the data in flat files and
manipulate out into files that mimicked SQL Server tables. I then
incremented the IDENTITY values and seeded the flat files. This required
many passes and some file sorts to get things into SQL Server, so it is
overkill if the file is very predictable and/or does not require
extensive manipulation.

Peace and Grace,


--
Gregory A. Beamer (MVP)

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
.



Relevant Pages

  • Re: Processing large files with TextFieldParser
    ... tested with the 1st csv and it worked perfectly. ... StreamReader and read in line by line, ... Another direction to conquer this, as you are storing in SQL Server, is ... you have to manipulate the CSV flat file into multiple tables, ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Ftp from within excel
    ... ie when the sql server generated a csv file, this process is completed, the ... > connection when that file is done and then to start download. ... > Dim strDirectoryList As String ...
    (microsoft.public.excel.charting)
  • Re: Ftp from within excel
    ... ie when the sql server generated a csv file, this process is completed, the ... > connection when that file is done and then to start download. ... > Dim strDirectoryList As String ...
    (microsoft.public.excel.misc)
  • Re: Ftp from within excel
    ... ie when the sql server generated a csv file, this process is completed, the ... > connection when that file is done and then to start download. ... > Dim strDirectoryList As String ...
    (microsoft.public.excel.printing)
  • Re: Ftp from within excel
    ... ie when the sql server generated a csv file, this process is completed, the ... > connection when that file is done and then to start download. ... > Dim strDirectoryList As String ...
    (microsoft.public.excel.crashesgpfs)