Re: Tidying data

From: Rick Sawtell (r_sawtell_at_hotmail.com)
Date: 02/22/05


Date: Tue, 22 Feb 2005 07:33:42 -0600

Load your data into the table (or a temp table for staging).

UPDATE <tablename>
    SET col1 = NULLIF(LTRIM(RTRIM(col1)), ''),
            col2 = NULLIF(LTRIM(RTRIM(col2)), ''),

and so forth.

Basically, this should do the following
1. Working from the inside out...
       Right Trim (trailing blanks) from the end of the column.
2. Left trim (leading blanks) from the front of the column.
3. Compare the result value with an empty string.
-- If they match, then return NULL, else the Column.

(LTRIM and RTRIM from the BOL return an empty string when they encounter
NULL values.)

This should be pretty fast for you.

Rick Sawtell
MCT, MCSD, MCDBA

"Griff" <Howling@The.Moon> wrote in message
news:uX3k2DOGFHA.3908@TK2MSFTNGP12.phx.gbl...
> I have a database table with >20,000 rows and >50 fields, ~45 of which are
> VARCHARS.
>
> The data in this database is populated from a fixed-width text file using
a
> process that is outside of my control. However, this results in many of
the
> fields having leading and/or trailing spaces.
>
> I have to find a way to tidy this data up....
>
> I adopted the tactic of getting all the table's data back as a
disconnected
> recordset and iterating through every row & field. If the type is VARCHAR
> and the value is not NULL, then I simply update the field's value with the
> trimmed value.
>
> Iterating through this is extremely quick.
>
> However, the UpdateBatch command is EXTREMELY SLOW. So slow that it takes
~
> 5 minutes to update ~ 4000 rows.
>
> Any alternative tactics I could use?
>
> Thanks
>
> Griff
>
>



Relevant Pages

  • Re: Tidying data
    ... > recordset and iterating through every row & field. ... the UpdateBatch command is EXTREMELY SLOW. ... > Any alternative tactics I could use? ...
    (microsoft.public.sqlserver.server)
  • Tidying data
    ... I have a database table with>20,000 rows and>50 fields, ~45 of which are ... Iterating through this is extremely quick. ... the UpdateBatch command is EXTREMELY SLOW. ... Any alternative tactics I could use? ...
    (microsoft.public.data.ado)
  • Tidying data
    ... I have a database table with>20,000 rows and>50 fields, ~45 of which are ... Iterating through this is extremely quick. ... the UpdateBatch command is EXTREMELY SLOW. ... Any alternative tactics I could use? ...
    (microsoft.public.sqlserver.server)
  • RE: Tidying data
    ... Did you try inserting trimmed data to new table? ... > recordset and iterating through every row & field. ... the UpdateBatch command is EXTREMELY SLOW. ...
    (microsoft.public.data.ado)
  • RE: Tidying data
    ... Did you try inserting trimmed data to new table? ... > recordset and iterating through every row & field. ... the UpdateBatch command is EXTREMELY SLOW. ...
    (microsoft.public.sqlserver.server)