Re: StreamReader->regex->Win32 api marshalling



When I said the csv data was rough, I meant it! As an example: it uses a
specific configurable field delimiter for all the fields in a record except
the last field in the record which uses a \r\n. BUT ... the last field (and
in fact any other field) may also contain \r\n in its data! My solution to
this was to use the regex definition of the format of the first field in each
record to find the end of the last field. Would I be able to handle this with
your csv parser?
Thanks
Tim

"shriop" wrote:

> Lots of info here, but I'll try to split it out.
>
> Regex's are a notoriously bad performance wise way to accomplish what
> you're trying to do.
> http://www.codeproject.com/cs/database/CsvReader.asp
> http://www.csvreader.com/csv_benchmarks.html
>
> I would suggest you either implement the code yourself using raw char
> array handling to find the commas, etc, etc, or purchase a commercial
> csv parser like the one I sell, http://www.csvreader.com .
>
> bcp is not what I would recommend for this because it does not properly
> handle escape sequences, and does not necessarily have a performance
> benefit that I've seen over dts. I would recommend calling a dts job
> instead to import this data. If you don't need to do any data
> manipulation to the original csv file, then there's no reason to even
> do any csv parsing, just let dts do it for you. If you do want to try
> the batching of insert statements like one of the previous comments
> suggested, make sure you start a transaction before doing the chunks of
> inserts, then commit at the end of a chunk. Otherwise, you'll get a
> performance hit as the db tries to commit after every single insert.
>
> Scott wrote:
> > Tim wrote:
> >
> > > db. It all works fine, but the performance is just not good enough -
> > > I need to at least halve the time its taking to process the data.
> > > I've profiled the code and it looks like the overhead is more or less
> > > evenly split between the reading/parsing on the one hand and the data
> > > loading/bcp/Win32 api marshalling on the other hand - so I'm looking
> >
> > Are you using the Compile flag on the regex?
> >
> > I would also try to insert to the database directly from your code by
> > batching the statements: insert xxx xxxx xxx;insert xxx xxxx xxx;insert
> > xxx xxxx xxx;... I would start by batching into groups of a 100 and
> > then try to tweak from there.
> >
> > I've had very good performance success with these two strategies.
> >
> > Scott C.
>
>
.



Relevant Pages

  • Re: Numbers are interpreted as varchar columns by the text file connection
    ... With Excel connection DTS can read column specifications whereas CSV as you ... > DTS interprets them correctly as a float or double column. ...
    (microsoft.public.sqlserver.dts)
  • Re: StreamReader->regex->Win32 api marshalling
    ... as the delimiter char; which is fine unless the last field in the row is text ... >> When I said the csv data was rough, ... >>> bcp is not what I would recommend for this because it does not properly ... >>> benefit that I've seen over dts. ...
    (microsoft.public.dotnet.framework.performance)
  • Re: Parser for CSV files
    ... >,I probably need another DTS package to unzip the data from DTS and ... > just pass the path to the CSV to your DTS package. ... Most people I know that have these types of uploads do not load records ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: .Net newbie-- can you save datasets into other formats?
    ... I think the thing is that existing tools are available to convert SQL Server data to CSV and Excel. ... DTS being an excellent example of a multi-purpose tool that can access all kinds of ODBC, ... They didn't want to burden the ADO.NET core with all kinds of CSV formats and Excel formats since such a tool already exists. ...
    (microsoft.public.dotnet.framework.adonet)
  • Re: StreamReader->regex->Win32 api marshalling
    ... I would recommend calling a dts job ... make sure you start a transaction before doing the chunks of ... performance hit as the db tries to commit after every single insert. ... Scott wrote: ...
    (microsoft.public.dotnet.framework.performance)

Loading