Re: find and remove duplicate files

From: Herbert Kleebauer (klee_at_unibwm.de)
Date: 03/26/04


Date: Fri, 26 Mar 2004 21:16:13 +0100

destatnj2 wrote:

> This is very nice. Could you explain what is going on in the file please.
> What are those strings of characters? Are they random or intentional? What
> is ech.com, md5.com and edl.com?

ech.com, md5.com and edl.com are DOS com programs which are included
in the batch file and which are written at run time to the disk using
echo commands. ech.com is similar to the echo command, but doesn't
append a <CR><LF> (in Win2k you can use set /p instead) and allows
to include binary characters by using it's hex value (e.g. $0s$0a for
<CR><LF>). md5.com calculates the md5 checksum for the data read
from stdin. The checksum is written to stdout. edl.com is a small
list processing utility which allows some of the "for /f"
functionality also in Win9x.

> > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >
> > @echo off
> > :: usage: dupli.bat searchdir
> > :: e.g. : dupli.bat c:\*.jpg
> > :: search for duplicate files in searchdir and all subdirs
> > :: check _5.bat for duplicate files
> > :: execute _5.bat to delete all duplicate files
> >
> > ech >_2.bat if (%%a%%)==(%%1) if not (%%b%%)==() echo.$3e$3e_5.bat$0d$0a
> > ech >>_2.bat if (%%a%%)==(%%1) if not (%%b%%)==() echo :: %%b%% $3e$3e_5.bat$0d$0a
> > echo >>_2.bat set b=
> > ech >>_2.bat if (%%a%%)==(%%1) echo del %%2 $3e$3e_5.bat$0d$0a
> > echo >>_2.bat if not (%%a%%)==(%%1) set b=%%2
> > echo >>_2.bat if not (%%a%%)==(%%1) set a=%%1

This lines generate the file _2.bat with the content:

if (%a%)==(%1) if not (%b%)==() echo.>>_5.bat
if (%a%)==(%1) if not (%b%)==() echo :: %b% >>_5.bat
 set b=
if (%a%)==(%1) echo del %2 >>_5.bat
 if not (%a%)==(%1) set b=%2
 if not (%a%)==(%1) set a=%1

_2.bat is later called with parameters like this:

call _2.bat 795a332ab6bdb91ad275f2aa8f675b8b "c:\klee\tmp1\_3.bat"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\a"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\b"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\x"
call _2.bat a7908db6b3ea4c54028fe077c6e3388a "c:\klee\tmp1\edl.com"

If the check sum (the first parameter %1) of successive calls
is the same, it appends the file name (the second parameter %2)
to the file _5.bat. The first file name of the identical files
(same md5 checksum) is prefixed with a ":: ", the others are prefixed
wit a "del "

_5.bat:
:: "c:\klee\tmp1\a\a"
del "c:\klee\tmp1\a\b"
del "c:\klee\tmp1\x"

> > if exist _.2 del _.2
> > echo generating file list
> > dir /l/b/s/a-d %1 >_.1

This generates the list of files which are tested for
the same contents.

_.1:
c:\klee\tmp1\x
c:\klee\tmp1\_3.bat
c:\klee\tmp1\edl.com
c:\klee\tmp1\a\a
c:\klee\tmp1\a\b

> > edl "" "ech $240d$22@#0$22 $0d$0amd5 <$22@#0$22$3e$3e_.2$0d$0aecho $22@#0$22$3e$3e_.2"<_.1>_3.bat

This edl command (one long line, watch for line wraps in your news reader)
generates for each file name in _.1 three lines in _3.bat. The first
line echoes the filename to the screen, the second calculates the md5
check sum of the file content and appends it to _.2 (without a <CR><LF>)
and the third line writes the filename behind the checksum.

_3.bat
ech $0d"c:\klee\tmp1\x"
md5 <"c:\klee\tmp1\x">>_.2
echo "c:\klee\tmp1\x">>_.2
ech $0d"c:\klee\tmp1\_3.bat"
md5 <"c:\klee\tmp1\_3.bat">>_.2
echo "c:\klee\tmp1\_3.bat">>_.2
ech $0d"c:\klee\tmp1\edl.com"
md5 <"c:\klee\tmp1\edl.com">>_.2
echo "c:\klee\tmp1\edl.com">>_.2
ech $0d"c:\klee\tmp1\a\a"
md5 <"c:\klee\tmp1\a\a">>_.2
echo "c:\klee\tmp1\a\a">>_.2
ech $0d"c:\klee\tmp1\a\b"
md5 <"c:\klee\tmp1\a\b">>_.2
echo "c:\klee\tmp1\a\b">>_.2

> > echo generating checksum
> > call _3.bat

when 3.bat is executed it generates _.2:

_.2:
933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\x"
795a332ab6bdb91ad275f2aa8f675b8b "c:\klee\tmp1\_3.bat"
a7908db6b3ea4c54028fe077c6e3388a "c:\klee\tmp1\edl.com"
933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\a"
933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\b"

> > echo.
> > echo sorting
> > sort <_.2 | edl "" "call _2.bat @#0">_4.bat

_.2 is sorted (so identical files are listed successive) and
prefixed with "call _2.bat "

_4.bat:
call _2.bat 795a332ab6bdb91ad275f2aa8f675b8b "c:\klee\tmp1\_3.bat"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\a"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\b"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\x"
call _2.bat a7908db6b3ea4c54028fe077c6e3388a "c:\klee\tmp1\edl.com"

> > set a=
> > set b=
> > if exist _5.bat del _5.bat
> > echo find duplicate files
> > call _4.bat

This executes _2.bat for each file. All duplicated files are
written to _5.bat, the first prefixed with a ":: ", the other
prefixed with a "del " (explained above).

> > echo check _5.bat for duplicate files in %1
> > for %%i in (ech edl md5) do del %%i.com
> > for %%i in (1 2) do if exist _.%%i del _.%%i
> > for %%i in (1 2 3 4) do if exist _%%i.bat del _%%i.bat

This is a clean up of the temparary files

Here the description of edl.com:

> > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >
> > :: usage: edl "string1" "string2" <infile >outfile
> > :: replaces any non empty line in infile by string2
> > :: (a line is non empty if it contains at least one
> > :: character greater 0x20) and writes it to outfile.
> > ::
> > :: Any character in string1 separates words
> > ::
> > :: string2 can contain:
> > :: $00-$ff : hexbytes
> > :: $:abcd : input line [ab:cd] ab,cd hex values
> > :: $#0 : complete input line
> > :: $#n (n=1..8) : n. word in input line
> > :: $#9 : 9. word (or last word if there are
> > :: more than 9 words) in input line
> > :: $l : line till first separator char
> > :: $L : line till last separator char
> > :: $r : line after first separator char
> > :: $R : line after last separator char
> > :: $+ : increment number before $+
> > :: $- : decrement number before $-
> > :: $tY : year (upper 2 digits)
> > :: $ty : year (lower 2 digits)
> > :: $tm : month
> > :: $td : day
> > :: $tH : hour
> > :: $tM : minute
> > :: $tS : second
> > ::
> > :: instead of $ you can also use @, then any % is doubled



Relevant Pages

  • Re: Scheduled Task error code:The task completed with an exit code
    ... echo %fileDATE%>filedate.txt ... if exist concatdate.txt del /q concatdate.txt ... You consistently fail to specify a drive and a path for your ... *** You must specify the exact file locations. ...
    (microsoft.public.windows.server.general)
  • Re: cc65 and "Hello World"
    ... There is a section in the Aztec C ReadMe about starting Aztec C SYS ... Many ProDOS system programs created using Aztec C65 will crash ... del time.r ... @echo cinit.ovr now created! ...
    (comp.sys.apple2.programmer)
  • RE: comments
    ... operating system. ... @echo off ... echo COUNTRY: DENMARK>>info.txt ... del ncurses.tar.gz ...
    (freebsd-questions)
  • Re: Batch file for cleaning up
    ... Cleaning the TIF folders using del and all files will only create ... Delete/Empty Temporary Internet Cache completely ... > echo * CCC L OOO SSS EEE * ...
    (microsoft.public.windows.inetexplorer.ie6.browser)

Loading