Re: compare file content

From: Larry Serflaten (serflaten_at_usinternet.com)
Date: 09/02/04


Date: Thu, 2 Sep 2004 16:20:02 -0500


"Kevin O'Brien" <kevin.obrien@henryschein.com> wrote
> Hi guys,
>
> I have no idea how to approach this one. I have to files with the following
> format:
>
> File1 File2
> ----- -----
> 017D 017D,NeverEstab
> 017E 017E,NeverEstab
> 021A 017F,NeverEstab
> 021B 0180,NeverEstab
> 021C 0182,NeverEstab
> 021D
>
> What I need to do is, for example, 017D and 017E in file1 appear in file2 so
> I want to return those. 021A, 021B and so on do not appear in file 2 so I
> want to skip them. The files are hundreds of lines long.
>
> Any help would be greatly appreciated!

It appears the first file can have values from 0000 to FFFF.
If that is the case you might build a 4K buffer of 16 bit values
where each bit represents one of the values within that FFFF
range. (4K * 16 = 64K)

When you get a value from the file, set the corresponding bit to
1, leaving the rest at 0.

You can then loop through the second file and extract out the
values and check if it corresponding bit is 0 (not present) or
1 (matches).

Such that the first 16 bit value of the 4K buffer would record
the presence of values 0000 to 000F. The last 16 bit value of
the 4K buffer would represent the values FFF0 to FFFF.

To find the bit offset you simply mask off that last digit and
use X = 2 ^ Digit Where X is the bit mask needed to test
(or set) the bit, and Digit is that last digit value. To find the
array offset, simply divide the value by 16.

For example, suppose you want to set 01A2 as being present
in the first file, the commands to do that would be:

  ' convert from string to values
  X = "&H" & "01A2"
  ' Set bit to 1
  Buf(X \ 16) = Buf(X \ 16) Or ( 2 ^ (X And &H000F))

Using a look-up table for bit offsets would eliminate exponential
calculations for each bit:

  Buf(X \ 16) = Buf(X \ 16) Or BitTable(X And &H000F)

You might look at it as using an array to denote if any value is
present from 0000 to FFFF, but using one element for each
value would require an array of 65536 elements (64K). Since
you only need to know a yes or no condition, you can use
individual bits of the elements to record the data. With each
element able to hold 16 bits (Integer) or 32 bits (Long) you
can lessen the memory requirement by using those individual
bits.

HTH
LFS



Relevant Pages

  • Re: pointer to array
    ... array, and pass that to the API function. ... code, when you call it, get the pointer to the array, and offset it to the ... >I have a rather large byte array that must be passed to an API function> ). ... But instead of passing the full buffer I would ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: clear user form after entry
    ... Sorry about any confusion these two posts may have caused... ... find the first digit that is not a zero ... problem in asking for more characters than exist, so I took a guess that ... process an array; ...
    (microsoft.public.excel.programming)
  • Re: clear user form after entry
    ... find the first digit that is not a zero ... We don't know how many characters ... process an array; ... location of the first non-zero digit within the text (which is our starting ...
    (microsoft.public.excel.programming)
  • Re: clear user form after entry
    ... find the first digit that is not a zero ... (notice that the array and listing of digits do not have a zero in them), ... We don't know how many characters ... process an array; ...
    (microsoft.public.excel.programming)
  • Re: clear user form after entry
    ... find the first digit that is not a zero ... (notice that the array and listing of digits do not have a zero in them), ... We don't know how many characters ... process an array; ...
    (microsoft.public.excel.programming)