Re: find and remove duplicate files
From: Herbert Kleebauer (klee_at_unibwm.de)
Date: 03/26/04
- Next message: Russell Styles: "Re: command prompt window > 50 lines"
- Previous message: Rob P: "Re: Re: Logon Script output log file"
- In reply to: destatnj2: "Re: find and remove duplicate files"
- Next in thread: Gary Smith: "Re: find and remove duplicate files"
- Messages sorted by: [ date ] [ thread ]
Date: Fri, 26 Mar 2004 21:16:13 +0100
destatnj2 wrote:
> This is very nice. Could you explain what is going on in the file please.
> What are those strings of characters? Are they random or intentional? What
> is ech.com, md5.com and edl.com?
ech.com, md5.com and edl.com are DOS com programs which are included
in the batch file and which are written at run time to the disk using
echo commands. ech.com is similar to the echo command, but doesn't
append a <CR><LF> (in Win2k you can use set /p instead) and allows
to include binary characters by using it's hex value (e.g. $0s$0a for
<CR><LF>). md5.com calculates the md5 checksum for the data read
from stdin. The checksum is written to stdout. edl.com is a small
list processing utility which allows some of the "for /f"
functionality also in Win9x.
> > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >
> > @echo off
> > :: usage: dupli.bat searchdir
> > :: e.g. : dupli.bat c:\*.jpg
> > :: search for duplicate files in searchdir and all subdirs
> > :: check _5.bat for duplicate files
> > :: execute _5.bat to delete all duplicate files
> >
> > ech >_2.bat if (%%a%%)==(%%1) if not (%%b%%)==() echo.$3e$3e_5.bat$0d$0a
> > ech >>_2.bat if (%%a%%)==(%%1) if not (%%b%%)==() echo :: %%b%% $3e$3e_5.bat$0d$0a
> > echo >>_2.bat set b=
> > ech >>_2.bat if (%%a%%)==(%%1) echo del %%2 $3e$3e_5.bat$0d$0a
> > echo >>_2.bat if not (%%a%%)==(%%1) set b=%%2
> > echo >>_2.bat if not (%%a%%)==(%%1) set a=%%1
This lines generate the file _2.bat with the content:
if (%a%)==(%1) if not (%b%)==() echo.>>_5.bat
if (%a%)==(%1) if not (%b%)==() echo :: %b% >>_5.bat
set b=
if (%a%)==(%1) echo del %2 >>_5.bat
if not (%a%)==(%1) set b=%2
if not (%a%)==(%1) set a=%1
_2.bat is later called with parameters like this:
call _2.bat 795a332ab6bdb91ad275f2aa8f675b8b "c:\klee\tmp1\_3.bat"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\a"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\b"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\x"
call _2.bat a7908db6b3ea4c54028fe077c6e3388a "c:\klee\tmp1\edl.com"
If the check sum (the first parameter %1) of successive calls
is the same, it appends the file name (the second parameter %2)
to the file _5.bat. The first file name of the identical files
(same md5 checksum) is prefixed with a ":: ", the others are prefixed
wit a "del "
_5.bat:
:: "c:\klee\tmp1\a\a"
del "c:\klee\tmp1\a\b"
del "c:\klee\tmp1\x"
> > if exist _.2 del _.2
> > echo generating file list
> > dir /l/b/s/a-d %1 >_.1
This generates the list of files which are tested for
the same contents.
_.1:
c:\klee\tmp1\x
c:\klee\tmp1\_3.bat
c:\klee\tmp1\edl.com
c:\klee\tmp1\a\a
c:\klee\tmp1\a\b
> > edl "" "ech $240d$22@#0$22 $0d$0amd5 <$22@#0$22$3e$3e_.2$0d$0aecho $22@#0$22$3e$3e_.2"<_.1>_3.bat
This edl command (one long line, watch for line wraps in your news reader)
generates for each file name in _.1 three lines in _3.bat. The first
line echoes the filename to the screen, the second calculates the md5
check sum of the file content and appends it to _.2 (without a <CR><LF>)
and the third line writes the filename behind the checksum.
_3.bat
ech $0d"c:\klee\tmp1\x"
md5 <"c:\klee\tmp1\x">>_.2
echo "c:\klee\tmp1\x">>_.2
ech $0d"c:\klee\tmp1\_3.bat"
md5 <"c:\klee\tmp1\_3.bat">>_.2
echo "c:\klee\tmp1\_3.bat">>_.2
ech $0d"c:\klee\tmp1\edl.com"
md5 <"c:\klee\tmp1\edl.com">>_.2
echo "c:\klee\tmp1\edl.com">>_.2
ech $0d"c:\klee\tmp1\a\a"
md5 <"c:\klee\tmp1\a\a">>_.2
echo "c:\klee\tmp1\a\a">>_.2
ech $0d"c:\klee\tmp1\a\b"
md5 <"c:\klee\tmp1\a\b">>_.2
echo "c:\klee\tmp1\a\b">>_.2
> > echo generating checksum
> > call _3.bat
when 3.bat is executed it generates _.2:
_.2:
933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\x"
795a332ab6bdb91ad275f2aa8f675b8b "c:\klee\tmp1\_3.bat"
a7908db6b3ea4c54028fe077c6e3388a "c:\klee\tmp1\edl.com"
933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\a"
933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\b"
> > echo.
> > echo sorting
> > sort <_.2 | edl "" "call _2.bat @#0">_4.bat
_.2 is sorted (so identical files are listed successive) and
prefixed with "call _2.bat "
_4.bat:
call _2.bat 795a332ab6bdb91ad275f2aa8f675b8b "c:\klee\tmp1\_3.bat"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\a"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\a\b"
call _2.bat 933222b19ff3e7ea5f65517ea1f7d57e "c:\klee\tmp1\x"
call _2.bat a7908db6b3ea4c54028fe077c6e3388a "c:\klee\tmp1\edl.com"
> > set a=
> > set b=
> > if exist _5.bat del _5.bat
> > echo find duplicate files
> > call _4.bat
This executes _2.bat for each file. All duplicated files are
written to _5.bat, the first prefixed with a ":: ", the other
prefixed with a "del " (explained above).
> > echo check _5.bat for duplicate files in %1
> > for %%i in (ech edl md5) do del %%i.com
> > for %%i in (1 2) do if exist _.%%i del _.%%i
> > for %%i in (1 2 3 4) do if exist _%%i.bat del _%%i.bat
This is a clean up of the temparary files
Here the description of edl.com:
> > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> >
> > :: usage: edl "string1" "string2" <infile >outfile
> > :: replaces any non empty line in infile by string2
> > :: (a line is non empty if it contains at least one
> > :: character greater 0x20) and writes it to outfile.
> > ::
> > :: Any character in string1 separates words
> > ::
> > :: string2 can contain:
> > :: $00-$ff : hexbytes
> > :: $:abcd : input line [ab:cd] ab,cd hex values
> > :: $#0 : complete input line
> > :: $#n (n=1..8) : n. word in input line
> > :: $#9 : 9. word (or last word if there are
> > :: more than 9 words) in input line
> > :: $l : line till first separator char
> > :: $L : line till last separator char
> > :: $r : line after first separator char
> > :: $R : line after last separator char
> > :: $+ : increment number before $+
> > :: $- : decrement number before $-
> > :: $tY : year (upper 2 digits)
> > :: $ty : year (lower 2 digits)
> > :: $tm : month
> > :: $td : day
> > :: $tH : hour
> > :: $tM : minute
> > :: $tS : second
> > ::
> > :: instead of $ you can also use @, then any % is doubled
- Next message: Russell Styles: "Re: command prompt window > 50 lines"
- Previous message: Rob P: "Re: Re: Logon Script output log file"
- In reply to: destatnj2: "Re: find and remove duplicate files"
- Next in thread: Gary Smith: "Re: find and remove duplicate files"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|