Re: File Duplication check
- From: "Nicholas Paldino [.NET/C# MVP]" <mvp@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 19 Apr 2007 11:17:19 -0400
That's the thing, from a file perspective, the files are different
because the metadata is embedded in the file. If you want to check to see
if specific portions of the file are different, then you are going to have
to open the file using word, and then compare word for word, style for
style, etc, etc. Not an easy task.
--
- Nicholas Paldino [.NET/C# MVP]
- mvp@xxxxxxxxxxxxxxxxxxxxxxxxxxx
<giftson.john@xxxxxxxxx> wrote in message
news:1176994738.686643.251010@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On Apr 19, 7:37 pm, "Nicholas Paldino [.NET/C# MVP]"
<m...@xxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
John,
Well, using a hash is the right way to go, but I don't understand why
everything gives you different values. I mean, if you have no
duplicates,
then yes, you SHOULD get different values.
What you have to do is scan the contents of the directory, hashing
each
file as you go. You then store the values of the hashes. While scanning
the directory, you check the value of the hash against the list you have
already compiled. If the hash exists in the list, then the two files
could
be duplicates (you really have to check both files against each other at
that point byte by byte to see if they are if you want to be completely
accurate).
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- m...@xxxxxxxxxxxxxxxxxxxxxxxxxxx
<giftson.j...@xxxxxxxxx> wrote in message
news:1176983002.868965.88530@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hi,
I am creating an application which migrates all documents from one
repository to another repository. Before migration i have to verify
all the documents are unique. No duplicates has to be uploaded. Event
the document created date, modified date, filename can be different.
How to find the document is duplidate.
What i did is, i created a file and did save as and saved into another
location. I am not able to find that the document is duplicate. I have
tried MD5 hash, CRC check, SHA1. Everything gives different values.
Can anyone give me a solution for this?
Thanks in advance.
Giftson John- Hide quoted text -
- Show quoted text -
Hi Nicholas,
I was bit confused about the MD5 hashing.
Could you please tell me how to compare the contents of Word
Documents. What is happening is MS Word is having some set of Metadata
and even the file contents are same, the metadata difference is giving
different MD5 hash value.
Thanks for your help.
.
- Follow-Ups:
- Re: File Duplication check
- From: DeveloperX
- Re: File Duplication check
- References:
- File Duplication check
- From: giftson . john
- Re: File Duplication check
- From: Nicholas Paldino [.NET/C# MVP]
- Re: File Duplication check
- From: giftson . john
- File Duplication check
- Prev by Date: Re: multi-cores & multi-processors how does that effect threaded programming?
- Next by Date: Tab Pages ... yet again !!!
- Previous by thread: Re: File Duplication check
- Next by thread: Re: File Duplication check
- Index(es):
Relevant Pages
|