Re: Best Performance File Compare: MD5/SHA1 or Byte-by-Byte Checking?
- From: "Mahmoud Al-Qudsi" <mqudsi@xxxxxxxxx>
- Date: 4 Apr 2007 08:37:02 -0700
On Apr 4, 6:31 pm, "Nicholas Paldino [.NET/C# MVP]"
<m...@xxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Mahmoud,
I hate to say this, but in the time it probably took to write this post,
you could have easily generated numbers which show the performance profiles
for your particular case.
I would look at the Stopwatch class, and then start testing to see how
long it would take to perform each operation. The operations themselves, as
well as the code to perform the timing, aren't difficult at all.
If I had to guess, for files that are 1024 bytes, it probably is easier
to just loop through them to see if any of the bytes differ. It would
probably be much faster than hashing the whole thing (since the hash has to
cycle through all of the bytes anyways, and you are cutting out if you find
a difference between any two of them).
Even in the 512kb case, you might want to use the method that loops
through two streams. This is an important point. Make sure you do not load
the entire contents of the two files into memory. For the small files, it's
no big deal, but for large files, you are going to take a hit trying to load
that into memory. By reading chunks of the files into memory, and then
comparing the chunks, you are going to make the process much more efficient.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- m...@xxxxxxxxxxxxxxxxxxxxxxxxxxx
"Mahmoud Al-Qudsi" <mqu...@xxxxxxxxx> wrote in message
news:1175699769.523361.105870@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I'm looking to compare the contents of two files. Files will generally
not exceed 1024 *bytes* in length.
Given this info, and assuming that the accuracy/reliability of SHA1 is
more than enough, is it more efficient to
a) Use System.Security.Cryptography and get the SHA1 of each binary
file and compare the two hashes
b) Create a byte-by-byte checker that loops through the two files and
exits with a false when a byte doesn't match in the same location
between the two files?
Generally speaking, I'd use the second method when dealing with
anything larger than 512kb, expecting it to take less resources/time.
However, in the case of such small files, is SHA1 a better-performing
alternative? What about MD5?
Assuming 99% of the time the two files will match, is MD5's limited
reliability enough to determine whether the two files are a match? Is
the performance difference between MD5 and SHA1 worth going with MD5
or am I better off sticking with the latter?
I'm guessing MD5 is good enough, that SHA1 takes a lot longer, and
that it won't matter since byte-by-byte is more efficient and faster
code (assuming it's programmed half-decently of course)... But I'd
like to make sure since I'm looking for a minimal hit on system
resources.
Thanks!
Thanks for the info Nicholas,
I'm looking into the stopwatch class even now, and I'm feeling pretty
stupid at how easy it is to use ;-)
Of course I am using a buffer for my byte-by-byte comparer - loading
64 bytes at a time, which though isn't efficient is the best I can do
with such tiny files.
.
- Follow-Ups:
- Re: Best Performance File Compare: MD5/SHA1 or Byte-by-Byte Checking?
- From: Nicholas Paldino [.NET/C# MVP]
- Re: Best Performance File Compare: MD5/SHA1 or Byte-by-Byte Checking?
- References:
- Best Performance File Compare: MD5/SHA1 or Byte-by-Byte Checking?
- From: Mahmoud Al-Qudsi
- Re: Best Performance File Compare: MD5/SHA1 or Byte-by-Byte Checking?
- From: Nicholas Paldino [.NET/C# MVP]
- Best Performance File Compare: MD5/SHA1 or Byte-by-Byte Checking?
- Prev by Date: Re: Best Performance File Compare: MD5/SHA1 or Byte-by-Byte Checking?
- Next by Date: Re: Best Performance File Compare: MD5/SHA1 or Byte-by-Byte Checking?
- Previous by thread: Re: Best Performance File Compare: MD5/SHA1 or Byte-by-Byte Checking?
- Next by thread: Re: Best Performance File Compare: MD5/SHA1 or Byte-by-Byte Checking?
- Index(es):
Relevant Pages
|