Re: Text File Cleanup Utility



I usually use Perl (free from www.activeperl.com) for this sort of job,
but commercial tools such as TextPipe Pro (www.datamystic.com) and
Monarch (www.datawatch.com) may be easier to use if you don't enjoy
programming.

If you know how many header and footer lines need to be deleted, the
Perl script I've pasted at the end of this message will do the job. It
chops off a specified number of lines from beginning and end of every
file that matches a wildcard filespec. You'd call it with something like
this:
perl "D:\folder\TopAndTail.pl" "*.txt" 4 5 -b

If you don't know how many lines you need to delete, the general idea is
to use a tool such as Perl (or your favourite programming or scripting
language) to read the file line by line and write each line to a new
file if it matches the pattern of the "data" lines, or if it doesn't
match the pattern of the header/footer lines. For example, if your data
is comma-separated but the header lines don't contain commas, you can
process all the .txt files in a folde with something like this:
perl -i.BAK -pe"print if m/,/" "D:\folder\myfile.txt"


On Sat, 28 May 2005 22:35:23 -0700, Cheese
<Cheese@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

>I have roughly 1500 text files with comma delimited data that first need to
>be cleaned up before importing to Access. Does anyone know of a (preferably
>freeware) program that will accomplish this?
>
>The areas in need of cleanup are identical in terms of size, only the text
>differs by a little between files. What it boils down to is that the first
>and last 10 (or so) lines of trash need to be deleted so the file is purely
>comma delimited.
>
>Thanks for your help.


#start of code==================
#TopAndTail.pl
#Removes header and footer lines from text files.
#John Nurick 2004-05

use strict;
die "\nTopAndTail.pl: removes header and footer lines from text files.
\n\n" .
"Usage: perl TopAndTail.pl filespec hdrlines ftrlines [-b]\n\n" .
" filespec can be a wildcard \n" .
" hdrlines and ftrlines are number of lines to omit.\n" .
" -b is optional switch to leave original file with .BAK
extension.\n\n" .
"NB: slurps each file into memory so not suitable for very large
files."
unless defined $ARGV[2];

my ($hdrlines, $ftrlines) = @ARGV[1 .. 2];
my $iofile; #path and name of input file
my $bakname; #path and name for backup file

undef $/; #slurp entire file

my @files = glob($ARGV[0]); #treat first argument as a wildcard

foreach $iofile (@files) {
open IOFILE, $iofile or die "Couldn't open $iofile for input.\n";
my (@lines) = split "\n", <IOFILE>;
if ($#lines < $hdrlines + $ftrlines) {
warn "$iofile doesn't have enough lines!\n";
close IOFILE;
next;
}
close IOFILE;
if ($ARGV[3] =~ m(^-b)i ) { # -b switch
$bakname = $iofile;
$bakname =~ s/\.[^\.]*$/\.BAK/;
unlink $bakname; #delete old .BAK file if it exists
rename $iofile, $bakname;
}
pop @lines until $lines[-1]; #dump any empty line(s) due to \n at end
of infile
open IOFILE, "> $iofile" or die "Couldn't open $iofile for output.\n";
print IOFILE join "\n", @lines[$hdrlines .. $#lines - $ftrlines];
close IOFILE;
print STDOUT "$iofile\n"; #list files as they are processed
}
#end of code======================================
--
John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
.



Relevant Pages

  • RE: Help Me with shell script please
    ... Because to my untrained eye your perl script ... > script that accesses the shell a wrapper. ... The "around the mail command" is the important part of ... and any fake header we want to make up on our own). ...
    (RedHat)
  • Re: Downgrade storable under Perl 5.8
    ... Hence Storable running on perl ... with 64 bit integers will read the header from a file written by a 32 bit ... correct backwards compatibility mode. ...
    (comp.lang.perl.modules)
  • Re: use strict; and O_WRONLY
    ... When I created a new XS module with h2xs, ... > constants I wanted to export to Perl. ... I wonder about the necessity of dlsym(). ... be mentioned in the header which has to be included in the XS file. ...
    (comp.lang.perl.misc)
  • Re: Broken pipe problem? If so, why?
    ... > message, reads the header, does some processing, and exits. ... > I am invoking this program in a pipe from procmail. ... Since your Perl program is only reading the header, ...
    (comp.lang.perl.misc)