Re: Display Changed Records
- From: "Russell Fields" <russellfields@xxxxxxxxxx>
- Date: Mon, 12 Oct 2009 16:23:15 -0400
Steen,
No hash system is 100% reliable (including HashBytes) and my comment was not meant to criticize you, but just to give a warning.
However, the BOL says: "we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change." (Not surprising since it is only entended as a simple hashing algorithm, primarily for creating shorter indexes for long character strings.) CHECKSUM is less reliable for checking changes than many people think it is.
Following is a script that demonstrates this for those who are curious.
SET NOCOUNT ON
declare @value1 int
declare @value2 int
declare @value3 int
create table #hashtest (a int, b int, c int)
set @value1 = 1
While @value1 <= 100
begin
set @value2 = 1
While @value2 <= 100
begin
set @value3 = 1
While @value3 <= 100
begin
insert into #hashtest values (@value1, @value2, @value3)
set @value3 = @value3 + 1
end
set @value2 = @value2 + 1
end
set @value1 = @value1 + 1
end
select checksum(*), count(*) from #hashtest
group by checksum(*)
having count(*) > 1
order by count(*) desc, checksum(*)
-- Choose one hash collision for examination
select * from #hashtest
where checksum(*) = 1793
order by a, b, c
select hashbytes ('MD5', CAST (a AS CHAR(10))+CAST (b AS CHAR(10))+CAST (c AS CHAR(10))),
count(*) from #hashtest
group by hashbytes ('MD5', CAST (a AS CHAR(10))+CAST (b AS CHAR(10))+CAST (c AS CHAR(10)))
having count(*) > 1
order by count(*) desc, hashbytes ('MD5', CAST (a AS CHAR(10))+CAST (b AS CHAR(10))+CAST (c AS CHAR(10)))
drop table #hashtest
With this artificial dataset, there are no rows with a unique CHECKSUM hash. With the HashBytes all rows have unique hashes. By examining the rows for CHECKSUM(*) = 1793, you will see that the Exclusive Or algorithm has a pretty obvious repeating pattern of conflicts.
All of this is just a For What It Is Worth.
RLF
"Steen Schlüter Persson" <steen@xxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:ONkTYyCSKHA.1908@xxxxxxxxxxxxxxxxxxxxxxx
I referred to the CHECKSUM as a method to determine if some of the values in a record has changed - I think that was what the OP was looking after. I agree that it might not be 100% reliable and that's also what BOL it saying. If that's acceptable, I find it quite usefull though and it had worked fine for me so far.
I agree that there are other methods to determine if data has changed but it depends on what the requirements are. Where I use it, is where I only want to check for changes in some columns out of the whole record. If I e.g. has a source table with 100 columns, I maybe only want to do an update if one or more out of 30 specific columns has changed. In that case a checksum calculation on those 30 columns has worked fine for me
I'm of course all ears if there are other methods that I don't know about or haven't thought of.
--
Regards
Steen Schlüter Persson (DK)
"Russell Fields" <russellfields@xxxxxxxxxx> wrote in message news:u82Z8wfRKHA.1796@xxxxxxxxxxxxxxxxxxxxxxxIf you want to use something like CHECKSUM, I suggest using HashBytes instead. The HashBytes algorithms are all better than CHECKSUM for avoiding collisions, but every algorithm has its weaknesses. See warnings and disclaimers in:
http://msdn.microsoft.com/en-us/library/ms345262.aspx
If you want to read about the 5 hash algorithms supported, Wikipedia (among others) has articles on them all.
Having said that, it all depends on why you were using CHECKSUM. If it was your method of trying to determine if something in a row changed, then it is not a really good solution. Any of the Hashbytes algorithms will be more resilient than CHECKSUM.
RLF
"Steen Schlüter Persson" <steen@xxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:%23w68u0bRKHA.4244@xxxxxxxxxxxxxxxxxxxxxxxAs already mentioned, there are no easy way of doing this, but another suggestion is to use the BINARY_CHECKSUM/CHECKSUM function.
You can then calculate a checksum based on the fields you want to compare. If you then store this checksum, you can compare with the new checksum and it if has changed, then the record has changed.
With Binary_Checksum, you can also use * so if you want to compare all fields in a table, you don't have to type in all the column names.
There are a few data types that is being ignored when using the function though so if you table contains any of there data types you might have to be a different check on these.
If you read up on the function in BOL you'll see that it's not absolutely garanteed that it will detect all changes, so it might not be good enough in your case, but that depends on your requirement(s).
--
Regards
Steen Schlüter Persson (DK)
"jazman58" <jazman1958@xxxxxxxxxxxxxxxxxxx> wrote in message news:D7adncBC1PjmslvXnZ2dnUVZ_oOdnZ2d@xxxxxxxxxxxxxxxBump up again
"jazman58" <jazman1958@xxxxxxxxxxxxxxxxxxx> wrote in message news:QZydnQm1jOLoVzrXnZ2dnUVZ_h6dnZ2d@xxxxxxxxxxxxxxx2005, why is there something different in 2008 cause i can install that.
What I am dealing with here, is 5 million records, with 271 fields, not my design. I am supplied with a quarterly update, a complete replacement database, and I need to prepare a report of the changes, so I need to extract the records that have changed in table2 and get a copy of the before record in table1 and create a report showing the changes.
I was hoping there was a facility to compare records in two tables. With out having compare each field.
"Uri Dimant" <urid@xxxxxxxxxxx> wrote in message news:O8nD6NSMKHA.5880@xxxxxxxxxxxxxxxxxxxxxxxAre you using SQL Server 2008?
"jazman58" <jazman1958@xxxxxxxxxxxxxxxxxxx> wrote in message news:8vednYU7fYTbDzvXnZ2dnUVZ_r-dnZ2d@xxxxxxxxxxxxxxxI have two tables:
table1, table2
Would like to display all records in table2 that have changed since table1.
Is there an sql command to do that?
Thanks,wj
.
- References:
- Re: Display Changed Records
- From: jazman58
- Re: Display Changed Records
- From: Steen Schlüter Persson
- Re: Display Changed Records
- From: Russell Fields
- Re: Display Changed Records
- From: Steen Schlüter Persson
- Re: Display Changed Records
- Prev by Date: Delete records with duplicate emails
- Next by Date: Default an identity value
- Previous by thread: Re: Display Changed Records
- Next by thread: When working in Query Analyzer, what can I use in order to bring up a dialog box?
- Index(es):
Relevant Pages
|