Re: Determining Encoding of a file

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: Jon Skeet [C# MVP] (skeet_at_pobox.com)
Date: 04/20/04


Date: Tue, 20 Apr 2004 09:10:39 +0100

Bob <anonymous@discussions.microsoft.com> wrote:
> Is there something under the CLR that I can use to determine the
> encoding of a file that I am reading without me reading the first two
> bytes to see if they are set to a unicode BOM?

No. There *is* no such thing as an encoding of a file, for sure. Many
files are perfectly valid but with different meanings depending on
which encoding they're loaded with. There's no firm way of determining
the encoding - but looking at the first few bytes may give you a good
*idea*.

-- 
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too


Relevant Pages

  • handling unicode data
    ... it handles the encoding of data I'm reading from a database. ... pymssql to access data stored in a SqlServer database, ... I'm reading is not unicode. ... It all seems to be an encoding issue, but I can't see what I'm doing ...
    (comp.lang.python)
  • More vulnerabilities (Re: Security side-effects of Word fields)
    ... ('binary' encoding is not supported, ... She could also keep track of who is reading (or printing) the file she sent to Bob: ...
    (Bugtraq)
  • Re: Problem processing Chinese character with Python
    ... expecting characters. ... You need some way of reading *characters*, rather than bytes from the file. ... To do this you need to know the encoding of the file, ... Hopefully someone who knows more about unicode will tell me if I've somehow ...
    (comp.lang.python)
  • Re: Encode file with UCS-2 Little Endian
    ... But I was having issues reading properly a file generated on another developers workstation who is from Russia. ... It was close enough, but when I processed the file and then output into UTF-8, the quotation characters and angled apostrophe were incorrect. ... The short and sweet to your comment, is that you are correct, that file came from somewhere, and the correct encoding should be used for reading if you are to successfully convert into another encoding. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: How to convert Delphi Record to C#?
    ... 12/29/2009 Dony ... Hrm, well that could be reading it with the wrong encoding. ...
    (microsoft.public.dotnet.languages.csharp)