Re: Determing internal format of file

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Oenone (noone_at_nowhere.com)
Date: 08/02/04


Date: Mon, 2 Aug 2004 12:29:16 +0100


> By looking at the contents of the file using a function or sub, how
> can we determine reliably if it is an ASCII file or an Excel file?

Prior to Excel 2003 (which changed the format of .xls files), Excel files
were binary files. If you open one in a binary editor (or a good text editor
such as Ultra Edit) you'll find that the first 100 or so bytes primarily
consist of either 0x00 of 0xff characters (there are others too but there
are lots of these). These characters will never appear in a .csv file unless
it has been corrupted. So you could just open it, scan the first 64 bytes or
so and see if any contain 0x00 or 0xff. If they do it's an Excel file. If
they don't, it's not (though that doesn't necessarily mean that is is a .csv
file, of course).

I've no idea about the format in which Excel 2003 files are stored, but
wouldn't be surprised if they're XML-based (in which case they would be
ASCII files, you'll need to find this out).

ASCII files should only ever contain characters in the range 32-127
(inclusive), as well as ascii 10 and 13 (line feed and carriage return).

Hope that's of some help,

-- 
(O)enone


Relevant Pages

  • Re: FTP problem
    ... Doesn't z/OS Unix use LF for ASCII files and NL for EBCDIC files? ... that ISO8859-1 specifies control characters at all). ... For IBM-MAIN subscribe / signoff / archive access instructions, ...
    (bit.listserv.ibm-main)
  • Funny characters in array after file() load.
    ... I am loading simple ASCII files that are delimited by newlines. ... except that the first string in the array is always ... The characters are absolutely not in the files themselves. ... novice PHP developer, ...
    (alt.php)
  • Re: Funny characters in array after file() load.
    ... > I am loading simple ASCII files that are delimited by newlines. ... They load ... except that the first string in the array is always ... > The characters are absolutely not in the files themselves. ...
    (alt.php)
  • Re: Cant a CSS file start with a comment to be validated?
    ... the CSS files gets the following characters at the start: ... Remove the BOM from ASCII files! ...
    (comp.infosystems.www.authoring.stylesheets)
  • Saving double-byte characters as unicode text in a CSV file
    ... to save some double-byte characters that I see in an Excel file into a ... comma-delimited CSV file. ... Open the file in Wordpad or Notepad ...
    (microsoft.public.excel.misc)