Re: compare file content
From: Larry Serflaten (serflaten_at_usinternet.com)
Date: Thu, 2 Sep 2004 16:20:02 -0500
"Kevin O'Brien" <email@example.com> wrote
> Hi guys,
> I have no idea how to approach this one. I have to files with the following
> File1 File2
> ----- -----
> 017D 017D,NeverEstab
> 017E 017E,NeverEstab
> 021A 017F,NeverEstab
> 021B 0180,NeverEstab
> 021C 0182,NeverEstab
> What I need to do is, for example, 017D and 017E in file1 appear in file2 so
> I want to return those. 021A, 021B and so on do not appear in file 2 so I
> want to skip them. The files are hundreds of lines long.
> Any help would be greatly appreciated!
It appears the first file can have values from 0000 to FFFF.
If that is the case you might build a 4K buffer of 16 bit values
where each bit represents one of the values within that FFFF
range. (4K * 16 = 64K)
When you get a value from the file, set the corresponding bit to
1, leaving the rest at 0.
You can then loop through the second file and extract out the
values and check if it corresponding bit is 0 (not present) or
Such that the first 16 bit value of the 4K buffer would record
the presence of values 0000 to 000F. The last 16 bit value of
the 4K buffer would represent the values FFF0 to FFFF.
To find the bit offset you simply mask off that last digit and
use X = 2 ^ Digit Where X is the bit mask needed to test
(or set) the bit, and Digit is that last digit value. To find the
array offset, simply divide the value by 16.
For example, suppose you want to set 01A2 as being present
in the first file, the commands to do that would be:
' convert from string to values
X = "&H" & "01A2"
' Set bit to 1
Buf(X \ 16) = Buf(X \ 16) Or ( 2 ^ (X And &H000F))
Using a look-up table for bit offsets would eliminate exponential
calculations for each bit:
Buf(X \ 16) = Buf(X \ 16) Or BitTable(X And &H000F)
You might look at it as using an array to denote if any value is
present from 0000 to FFFF, but using one element for each
value would require an array of 65536 elements (64K). Since
you only need to know a yes or no condition, you can use
individual bits of the elements to record the data. With each
element able to hold 16 bits (Integer) or 32 bits (Long) you
can lessen the memory requirement by using those individual