Re: Text Script
- From: "Jim Vierra" <jvierra@xxxxxxx>
- Date: Sun, 8 May 2005 12:30:10 -0400
Al
I agree that simplicity is the real issue. I suppose it's really a matter
of what you are comfortable with. The dictionary hash is really not much
different for speed due to the smallness of the "Key" but is a simple way of
doing it.
The point I wanted to make, primarily, is that re-reading a stream has no
more overhead than reading from an array as it is an in-memory operation.
String matches for short strings, less than 1K, are very fast in VBS.
For each match of a key in the Dictionary we have to generate the hash for
the "query" value. We also have to generate an index table when building the
Dictionary. For small files this adds much overhead. For large files it
may gain speed for us down he road.
In the end, as I stated above, whatever seems easiest for the user is going
to be the best approach.
--
Jim Vierra
"Al Dunbar [MS-MVP]" <alan-no-drub-spam@xxxxxxxxxxx> wrote in message
news:%23aKKBv9UFHA.3312@xxxxxxxxxxxxxxxxxxxxxxx
>
> "Jim Vierra" <jvierra@xxxxxxx> wrote in message
> news:eKlmnEgUFHA.3312@xxxxxxxxxxxxxxxxxxxxxxx
>> For a file with a few hundred lines the buffering mechanism in W2K and
> after
>> will be faster than an array.
>
> Having not done any timing tests I'm not sure about *faster*, but if a
> file
> can be re-read from a system buffer, this will no doubt improve the
> performance of the "brute force" approach.
>
>> VBScript arrays of more than a few tens of
>> lines tend to be slow. Dictionary are even slower. Rewinding a file is
>> instantaneous and the file is already in memory after the fist pass.
>
> Rewinding a file might indeed be instantaneous as you say, but you are
> comparing apples to oranges here. After rewinding the file, it must be
> processed again, line by line, and compared against some value to find a
> match. The rewind part of this operation becomes insignificant to the
> number
> of comparisons. IMHO, this type of operation is where the dictionary
> object
> becomes more efficient.
>
>> RegEx can match by template for file1 and the retrieved value can be
> quickly
>> matched against file2.
>>
>> Using arrays is nice but I don't see any real performance gains due to
> array
>> behavior and slow dictionary response.
>
> In all honesty, I was not considering so much the runtime efficiency as
> the
> design time efficiency, and the simplicity of using the dictionary object.
> Of course, design time efficiency may be in the eye of the beholder -
> there
> is no point in using someone else's approach if you think it is not the
> most
> natural one.
>
>> I will say this. A dictionary could be a nice method for comparing but
>> the
>> number of comparisons is still the same.
>
> Not necessarily. The dictionary object a hashing technique so that, for
> example:
>
> if dicob.exists(someKey)
>
> does NOT compare all of the keys against the value given.
>
> In your method, and even given that the files are only read from disk to
> memory once, each record read from the second file must be compared to as
> many records in the first file as it takes before a match is found.
>
>> We should try both just to see.
>
> I think that it might be difficult to do a valid timing test for such a
> small problem. I won't bother mainly because I do not see it specifically
> as
> an issue of coming up with the fastest solution possible, as that would be
> better done by a compiled language anyway.
>
> /Al
>
>
>>
>> --
>> Jim Vierra
>>
>> "Al Dunbar [MS-MVP]" <alan-no-drub-spam@xxxxxxxxxxx> wrote in message
>> news:u5wu%23SfUFHA.3244@xxxxxxxxxxxxxxxxxxxxxxx
>> > Regardless of the actual format, and given that the ordering of
>> > accounts
>> > is
>> > going to be different in the two files, I would strongly (strongly)
>> > recommend against doing this:
>> >
>> >> >> So you need to compare line1 of file1 with every line in file2 and
> so
>> > on
>> >> >> for
>> >> >> every line in file1.
>> >
>> > by brute force, i.e., for each line you read from one file,
> rewind/re-open
>> > the other and read in and check every line against the "master" line.
>> >
>> > There are two methods that would be useful to consider here:
>> >
>> > a) read both files in their entirety using .readall, and then process
> them
>> > as arrays of lines using the split function;
>> >
>> > b) read in one file, building up a dictionary object with the key being
>> > the
>> > account number, and the data being, well, whatever it is. Then read the
>> > second file, extract the account number, check to see if they are
> present
>> > in
>> > the dictionary object, flag an error if they are not, otherwise "merge"
> or
>> > "append" the data from this second file into the dictionary object.
>> > Then
>> > write out the records in the dictionary object to a file, noting which
>> > were
>> > not updated with data from the second file.
>> >
>> > /Al
>> >
>> > "Jim Vierra" <jvierra@xxxxxxx> wrote in message
>> > news:eMu%238yYUFHA.3152@xxxxxxxxxxxxxxxxxxxxxxx
>> >> A sample set of lines would be more helpful. You can mask anything
>> > private
>> >> and only send 5 or 6 lines that may have different structures.
>> >>
>> >> --
>> >> Jim Vierra
>> >>
>> >> "Scott Burns" <ScottBurns@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
>> >> news:76A9598C-6B51-4375-8176-2B380752CCD8@xxxxxxxxxxxxxxxx
>> >> > Basically this is an account number that is 12 digits long found in
>> >> > different
>> >> > parts of each of the files and I need to be able to verify that they
>> >> > are
>> >> > the
>> >> > same lines you know apples for apples and then merge the data in the
>> >> > two
>> >> > file
>> >> > together to one line.
>> >> > --
>> >> > Scott Burns
>> >> >
>> >> >
>> >> > "Jim Vierra" wrote:
>> >> >
>> >> >> So you need to compare line1 of file1 with every line in file2 and
> so
>> > on
>> >> >> for
>> >> >> every line in file1.
>> >> >> What is the template for the item you are searching for?
>> >> >> Is it a stable template (eg 999999 0r 999-99-9999 or
>> >> >> (999)999-9999 )
>> >> >> or
>> >> >> is
>> >> >> it variable? (eg sometimes it's 99 sometimes 9999) where the "9" is
> a
>> >> >> template for any number.
>> >> >> Is it always in the same place in the line?
>> >> >>
>> >> >> Please understand that you have not given enough information to be
>> >> >> able
>> >> >> to
>> >> >> design a method for doing this. Suppose more than one number is in
>> >> >> the
>> >> >> line
>> >> >> and the number you want is the second or third number. How do we
> know
>> >> >> which
>> >> >> number to match?
>> >> >> --
>> >> >> Jim Vierra
>> >> >>
>> >> >> "Scott Burns" <ScottBurns@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
> message
>> >> >> news:F3AF02FC-FBAA-4AC7-867B-2F33464BE190@xxxxxxxxxxxxxxxx
>> >> >> > Below is what I have written so far. I have file1 that comes
>> >> >> > from
>> > one
>> >> >> > computer with dollar figures in it listed by account number in no
>> >> >> > sequential
>> >> >> > order. I have File2 from another computer that is in no order
>> > either.
>> >> >> > I
>> >> >> > need to readline in file2 to find an account number. I then need
> to
>> >> >> > find
>> >> >> > the
>> >> >> > same account number in file1 and print specific information in a
>> >> >> > specific
>> >> >> > format to one text file.
>> >> >> >
>> >> >> > Both files contain about 200 or so lines and could contain more
> then
>> >> >> > that.
>> >> >> > I know I need to use one file as my master to query the second
> file
>> >> >> > with.
>> >> >> > I
>> >> >> > have gotten as far as the first line of data but I can't get it
>> >> >> > to
>> >> >> > go
>> >> >> > to
>> >> >> > the
>> >> >> > next line of both files. It only queries the file for the first
>> > line.
>> >> >> >
>> >> >> > Please help.
>> >> >> >
>> >> >> > Dim fso, Tip, file1, file2
>> >> >> > Set fso = CreateObject("Scripting.FileSystemObject")
>> >> >> > set Tip = fso.CreateTextFile("Tip.txt")
>> >> >> > set file2 = fso.OpenTextFile("79600")
>> >> >> > set file1 = fso.OpenTextFile("0241_2005A")
>> >> >> > Do While Not file2.AtEndOfStream
>> >> >> > str=file2.ReadLine
>> >> >> > str2=file1.Readline
>> >> >> > if Mid(str,310,12)=Mid(str2,3,12) then
>> >> >> > 'Tip.WriteLine "P"&"B"&
>> >> >> >
>> >
> rtrim(Mid(str,310,12))&"241"&rtrim(Mid(str,12,10))&rtrim(Mid(str,437,9))&rtr
>> > im(Mid(str2,10,9))
>> >> >> > 'above is what I want but below is my test of duplicating file
> names
>> > to
>> >> >> > make
>> >> >> > sure it works
>> >> >> > Tip.WriteLine
>> >> >> > "File2"&rtrim(Mid(str,310,12))&"File3"&rtrim(Mid(str2,3,12))
>> >> >> > End if
>> >> >> > Loop
>> >> >> > Tip.Close
>> >> >> > file2.Close
>> >> >> > file3.Close
>> >> >> > Wscript.Quit
>> >> >> >
>> >> >> > --
>> >> >> > Scott Burns
>> >> >> >
>> >> >> >
>> >> >> > "Al Dunbar [MS-MVP]" wrote:
>> >> >> >
>> >> >> >>
>> >> >> >> "Scott Burns" <ScottBurns@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
>> > message
>> >> >> >> news:A5255BB8-8C97-47F1-972E-CABF4ECA518C@xxxxxxxxxxxxxxxx
>> >> >> >> > I am looking for some help with a script. I am trying to read
>> >> >> >> > two
>> >> >> >> > text
>> >> >> >> > documents and read line by line and find similar data. After
>> >> >> >> > I
>> > find
>> >> >> >> > that
>> >> >> >> > similar data like an account number I want to then merge the
> two
>> > to
>> >> >> >> > the
>> >> >> >> same
>> >> >> >> > line. Please help out if possible.
>> >> >> >>
>> >> >> >> Your difficulty may lie in the vagueness of the description of
> the
>> >> >> >> problem
>> >> >> >> and of the problem set.
>> >> >> >>
>> >> >> >> By "merge the two to the same line" do you mean to write out the
>> >> >> >> two
>> >> >> >> similar
>> >> >> >> records to a third file on the same line, or to cause the
> original
>> >> >> >> similar
>> >> >> >> lines to be changed in the two input files such that they both
>> > contain
>> >> >> >> a
>> >> >> >> copy of each of the two similar values?
>> >> >> >>
>> >> >> >> By similar data, do you mean numerical values that are
>> > mathematically
>> >> >> >> close
>> >> >> >> to each other? How close? Or do you mean words that sound the
> same,
>> >> >> >> like
>> >> >> >> "bow" and "bough"? Or would "vvvvvvvv" be similar to "wwwwwww"?
>> >> >> >>
>> >> >> >> If the value in line 10 in file A is similar to the value in
>> >> >> >> line
>> >> >> >> 15
>> >> >> >> in
>> >> >> >> file
>> >> >> >> B would you want to match line 15 in file A with line 10 in file
> B
>> > if
>> >> >> >> they
>> >> >> >> happened to be similar?
>> >> >> >>
>> >> >> >> Or... is it a line by line match: if line 1 in both files do not
>> >> >> >> match,
>> >> >> >> do
>> >> >> >> nothing. if line 2 does match, write the output to a third file?
>> >> >> >>
>> >> >> >> Perhaps a sample set of data showing us which data you consider
>> >> >> >> similar
>> >> >> >> and
>> >> >> >> which you do not, plus the desired result.
>> >> >> >>
>> >> >> >> /Al
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>> >
>> >
>>
>>
>
>
.
- Follow-Ups:
- Re: Text Script
- From: Al Dunbar [MS-MVP]
- Re: Text Script
- References:
- Text Script
- From: Scott Burns
- Re: Text Script
- From: Al Dunbar [MS-MVP]
- Re: Text Script
- From: Scott Burns
- Re: Text Script
- From: Jim Vierra
- Re: Text Script
- From: Scott Burns
- Re: Text Script
- From: Jim Vierra
- Re: Text Script
- From: Al Dunbar [MS-MVP]
- Re: Text Script
- From: Jim Vierra
- Re: Text Script
- From: Al Dunbar [MS-MVP]
- Text Script
- Prev by Date: Re: Safe Mode detection
- Next by Date: Re: Safe Mode detection
- Previous by thread: Re: Text Script
- Next by thread: Re: Text Script
- Index(es):
Relevant Pages
|