Re: Large text file to a CSV file...

From: John Nurick (j.mapSoN.nurick_at_dial.pipex.com)
Date: 05/27/04


Date: Thu, 27 May 2004 22:14:10 +0100

Hi Louie,

Sorry for the slow response; I've been away from home.

If you've really got a 3,094,112 kB (about 3 GB) CSV file you're not
going to be able to import it into Access anyway: the maximum possible
size of an MDB file is 2GB, and that's also the limit on the size of an
MSDE database.

But it's easy for programs to modify huge textfiles. The trick is simply
to read a line at a time, process it, and then write the processed line
to a new file. This VBScript script converts brackets to quote marks
(and can be modified do make other changes):

Option Explicit
'VBScript template for processing textfiles line by line
Const BAK_EXT = ".bak"
Dim fso 'FileSystemObject
Dim fF 'File
Dim fIn, fOut 'Textstreams
Dim strL 'String

Function ProcessedLine(ByVal strLine)
  'This does all the work.
  'Put code here that returns a string to
  'write to the output file
  ProcessedLine = Replace(Replace(strLine, "[", """"), "]", """")
End Function

'Main body of script iterates through the file
'applying the ProcessLine function to each line
If WScript.Arguments.Count = 1 Then
  Set fso = CreateObject("Scripting.FileSystemObject")
  Set fF = fso.GetFile(WScript.Arguments(0))
  
  'If backup file aready exists, delete it
  If fso.FileExists(fF.Path & BAK_EXT) Then
    fso.DeleteFile fF.Path & BAK_EXT
  End If
  fF.Name = fF.Name & BAK_EXT
  Set fIn = fF.OpenAsTextStream(1) 'ForReading
  Set fOut = fso.CreateTextFile(WScript.Arguments(0))
  
  Do 'Read line by line
    strL = fIn.ReadLine
    strL = ProcessedLine(strL)
    fOut.Write strL & vbCRLF
  Loop Until fIn.AtEndOfStream
  
  'Tidy up
  fIn.Close
  fOut.Close
Else
  MsgBox "Drag a file onto the icon to process it. " _
    & vbcrlf & "Original file will be renamed with " _
    & ".bak extension"
End If

On Tue, 25 May 2004 06:38:59 -0700, "Louiie Warren" <lwarren@ticom.com>
wrote:

>There's the rub... I would have to write a preprocessor
>routine because a 3,094,112KB file would be difficult (to
>say the least) to edit. WSP does imbed the - when there
>is no value. I may be finding that this is more work than
>I need to go through.
>
>
>>-----Original Message-----
>>Hi Louie,
>>
>>Access's text import routine has problems with files like
>this, because
>>the space is not just being used as a delimiter, it also
>appears in the
>>data without being "qualified" by enclosing the entire
>field in quote
>>marks.
>>
>>It looks as if the problematic field is the datestamp.
>Using "-" for an
>>unknown username shouldn't be a problem because it
>doesn't introduce any
>>extra spaces, though if WebSite Pro uses " - " it would.
>>
>>One approach would be to pre-process the file by simply
>replacing the [
>>and ] around the datestamp with " and ". That should
>convert it into a
>>regular space-delimited file, with fields that may
>contain a space
>>qualified with " " . It would then be possible to import
>it to a
>>temporary Access table, where you could run a query to
>parse the
>>datestamp field and convert it to an Access date time
>value as you
>>append it to your "permanent" table.
>>
>>The pre-processing could be done in a text editor, or if
>it's a regular
>>task you could work up a VBA routine using Line Input #
>to read the file
>>line by line, the Replace() function to change the line,
>and Print # to
>>write it to a new file. Then import the new file.
>>
>>Another approach would be to use a regular expression
>engine to parse
>>the input line into fields and append these directly to
>your table.
>>
>>
>>On Fri, 21 May 2004 09:50:16 -0700, "Louie Warren"
><lwarren@ticom.com>
>>wrote:
>>
>>>I have a WebsitePro Access Log Combined (W3C/NCSA)
>Format
>>>file; and yes, I know there is an ODBC routine out there
>>>to do this, but my company doesn't want to spend the
>money
>>>to get it because we are abandoning WebSite Pro in the
>>>next few months. The data is on a classified system so
>I
>>>will attempt to recreate the problem with generic data.
>I
>>>already see problems with the blank delimiters...
>>>
>>>An example from the book (string these together with a
>>>space between fields.
>>>
>>>172.18.96.157 (IP Address)
>>>
>>>gryphonaire.com (Hostname of server
>>>receiving request)
>>>
>>>Admin (Authentication user
>name)
>>>
>>>[02/Jan/2001:09:44:56 - 0800] (Date timestamp UTC
>offset)
>>>
>>>"GET /members.html HTTP/1.1" (Complete HTTP request)
>>>
>>>200 (HTTP Response Code)
>>>
>>>178 (Number of bytes
>transfered)
>>>
>>>"http://gryphonaire.com/index.html" (Complete referring
>>>URL)
>>>
>>>"Mozilla/4.0 (compatible; MSIE 4.01; Windows NT)" (ID
>of
>>>browser)
>>>
>>>"Member=WebSiteRules" (Cookie)
>>>
>>>The problem in the Access import is if the
>Authentication
>>>username isn't available. WSP puts in a - This is
>>>common for most of the requests. When an actual ID is
>>>present, it doesn't fit the pattern with the others.
>The
>>>data is consistantly like the above example (", [, and
>>>all).
>>>
>>>Hopefully that explains it.
>>>
>>>L
>>
>>--
>>John Nurick [Microsoft Access MVP]
>>
>>Please respond in the newgroup and not by email.
>>.
>>

--
John Nurick [Microsoft Access MVP]
Please respond in the newgroup and not by email.


Relevant Pages

  • Re: Export to a flat file - Success
    ... Dim fIn, fOut 'Textstreams ... Dim strL 'String ... Set fOut = fso.CreateTextFile) ... MsgBox "Drag a file onto the icon to remove all linebreaks. ...
    (microsoft.public.access.externaldata)
  • Re: How to get the maximum record length in a file
    ... Here's a VBScript that should do the same job, ... Dim fso 'FileSystemObject ... Dim strL 'String ... Dim LineLength ...
    (microsoft.public.access.externaldata)

Loading