Re: StreamReader->regex->Win32 api marshalling
- From: Tim <Tim@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 1 Sep 2005 02:21:03 -0700
When I said the csv data was rough, I meant it! As an example: it uses a
specific configurable field delimiter for all the fields in a record except
the last field in the record which uses a \r\n. BUT ... the last field (and
in fact any other field) may also contain \r\n in its data! My solution to
this was to use the regex definition of the format of the first field in each
record to find the end of the last field. Would I be able to handle this with
your csv parser?
Thanks
Tim
"shriop" wrote:
> Lots of info here, but I'll try to split it out.
>
> Regex's are a notoriously bad performance wise way to accomplish what
> you're trying to do.
> http://www.codeproject.com/cs/database/CsvReader.asp
> http://www.csvreader.com/csv_benchmarks.html
>
> I would suggest you either implement the code yourself using raw char
> array handling to find the commas, etc, etc, or purchase a commercial
> csv parser like the one I sell, http://www.csvreader.com .
>
> bcp is not what I would recommend for this because it does not properly
> handle escape sequences, and does not necessarily have a performance
> benefit that I've seen over dts. I would recommend calling a dts job
> instead to import this data. If you don't need to do any data
> manipulation to the original csv file, then there's no reason to even
> do any csv parsing, just let dts do it for you. If you do want to try
> the batching of insert statements like one of the previous comments
> suggested, make sure you start a transaction before doing the chunks of
> inserts, then commit at the end of a chunk. Otherwise, you'll get a
> performance hit as the db tries to commit after every single insert.
>
> Scott wrote:
> > Tim wrote:
> >
> > > db. It all works fine, but the performance is just not good enough -
> > > I need to at least halve the time its taking to process the data.
> > > I've profiled the code and it looks like the overhead is more or less
> > > evenly split between the reading/parsing on the one hand and the data
> > > loading/bcp/Win32 api marshalling on the other hand - so I'm looking
> >
> > Are you using the Compile flag on the regex?
> >
> > I would also try to insert to the database directly from your code by
> > batching the statements: insert xxx xxxx xxx;insert xxx xxxx xxx;insert
> > xxx xxxx xxx;... I would start by batching into groups of a 100 and
> > then try to tweak from there.
> >
> > I've had very good performance success with these two strategies.
> >
> > Scott C.
>
>
.
- Follow-Ups:
- Re: StreamReader->regex->Win32 api marshalling
- From: shriop
- Re: StreamReader->regex->Win32 api marshalling
- References:
- Re: StreamReader->regex->Win32 api marshalling
- From: Scott
- Re: StreamReader->regex->Win32 api marshalling
- From: shriop
- Re: StreamReader->regex->Win32 api marshalling
- Prev by Date: Re: StreamReader->regex->Win32 api marshalling
- Next by Date: Re: C++.NET & message loop
- Previous by thread: Re: StreamReader->regex->Win32 api marshalling
- Next by thread: Re: StreamReader->regex->Win32 api marshalling
- Index(es):
Relevant Pages
|
Loading