Re: Check folder for duplicate files
- From: dpb <none@xxxxxxx>
- Date: Tue, 14 Aug 2007 15:30:41 -0500
Co wrote:
On 14 aug, 21:03, "Mike Williams" <mi...@xxxxxxxxxxxxxxxxx> wrote:...."Co" <vonclausow...@xxxxxxxxx> wrote in message
news:1187112810.989536.322340@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I need a code to check a folder on my HDD for duplicate files.. . . just to expand on what I said in my previous post (which seems a
little ambiguous now that I've looked at it again), if you discover a file
called test1.doc and another file called test1.pdf they may in fact
represent two totally different documents, regardless of the fact that their
names happen to be the same. Also, checking the actual file contents will
not produce any meaningful result because they will be completety different.
The only thing I want to do is check for the names of the files.
If they match (without the extension) then I have to both open them and check if they have the same data. Believe me in my business it
happens.
So, what is the definition of "contain same data"?
Are you asking to examine the contents of a .pdf file as compared to a ..doc as compared to a plain text .txt file and parse them to see if the same words exist in the same order despite all the formatting someone apparently worked hard to accomplish and trash it for the original?
Or is it simply two copies of the _same_ identical text file have been saved but one was inadvertently saved w/ an extension that doesn't represent the actual file content?
If the former, you've bit off a big job as Mike says as there are an infinite number of possible ways and you'll have to do a complete lexical parsing of the file format to remove the superfluous information and uncover the fundamental "sameness" underlying it.
If the later, that's a diff utility of which there are a zillion and I'd suggest using one of them pre-rolled would be the place to start...
Again, if we're off target, post back w/ more detail as the problem still seems quite poorly defined.
--
.
- References:
- Check folder for duplicate files
- From: Co
- Re: Check folder for duplicate files
- From: Mike Williams
- Re: Check folder for duplicate files
- From: Co
- Check folder for duplicate files
- Prev by Date: Re: VB .Net 2005 Deploy/Run/Debug WinApp with Form or Formless
- Next by Date: Re: VB .Net 2005 Deploy/Run/Debug WinApp with Form or Formless
- Previous by thread: Re: Check folder for duplicate files
- Next by thread: Code Advisor Q
- Index(es):
Relevant Pages
|