Re: How to distinguish between binary and ASCII file on file opening?

From: Bill Thompson (billt61_at_rgv.rr.com)
Date: 08/15/04


Date: Sun, 15 Aug 2004 01:05:08 -0500


"CFF" <cffung@myrealbox.com> wrote in message
news:2560187f.0408091926.16ca0e19@posting.google.com...
> I am writing an applicatioin that will need to distinguish between
> binary and ASCII file to be loaded from hard disk so that different
> processing is applied to different type of file.
>
> Is there any simple way to identify the type of file during the file
> opening process? Thank you.
>
> CFF

You could analyze character frequency. If the predominate characters are
A-Za-z you probably have ASCII, if the distribution is more uniform you
probably have binary. The more characters you check, the stronger the
'probably' becomes.

Any clues about the contents of the file can be useful as well; e.g., you
can recognize a CSV file by the number of unqouted commas per line.



Relevant Pages

  • Re: EBCDIC to ASCII file conversion
    ... I've used cygwin and UnixUtils' dd to verify that we can routinely convert further EBCDIC files, and both apps generate the same output for the EBCDIC file supplied. ... All lines in the ASCII file they supplied contain 451 characters, most of which contain zeroes for the 2nd half of the line. ... In the output from dd, there are numerous instances of a left brace, followed by nine 0's, followed by another left brace. ...
    (comp.sys.ibm.as400.misc)
  • Merge errors
    ... We are getting errors when we try to do a mail merge in Word 2003 SP1 ... with an ASCII file generated in a custom database. ... name to 39 or less by removing the spaces, ... The "#%$" are actually displayed as chinese characters?. ...
    (microsoft.public.word.mailmerge.fields)
  • Re: Recognising file type (ascii/binary)
    ... > characters you can deem the file binary or corrupt. ... > heuristics for guessing the "type" of the file based ... > (And files with 2-byte UNICODE characters can really confuse things.) ... probably isn't a valid ASCII file on any of the above three platforms. ...
    (comp.lang.java.programmer)
  • Re: Words default formatting for .TXT files
    ... a synonym for ASCII file, a file in which characters are represented by ... Contrast with a binary file, ... is no one-to-one mapping between bytes and characters. ... binary files to preserve the formatting. ...
    (microsoft.public.word.conversions)

Loading