Re: XML Processing instruction for UTF-8

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Jochen Kalmbach [MVP] wrote:
> Hi Alex!
>
>> How can you tell that it is not UTF-8?
>
> Most UTF-8 documents containing a BOM:
> http://www.unicode.org/faq/utf_bom.html

Yes, I know that BOM can be used to determine serialization
encoding. However, MSXML will not save BOM for UTF-8, as you
probably already know. Actually, BOM is not required by XML
specification and UTF-8 is always assumed unless BOM is
present or processing instruction specifies otherwise.

Strictly speaking, BOM is misnomer for UTF-8 stream since
"byte ordering" concept is inapplicable for UTF-8 (unlike
UTF-16/32) and actually BOM is used as magic number. It is
noted in above mentioned FAQ, too.
(http://www.unicode.org/faq/utf_bom.html#3)

So, under Windows using MSXML one will get BOM'less XML
files by default.


.



Relevant Pages

  • Re: Invalid characters before xml header
    ... "UTF-8" hence the BOM which is a 16 a magic 16 bit unicode value usually put ... Just to confuse things I seem to remember that Encoding.UTF8 and new ... checked - the output XML files were identical. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: UTF-8 encoding problem
    ... Although both of them are having UTF-8 as BoM, but only first file is ... having UTF-8 defined in XML declration at the top of the XML file as ...
    (comp.lang.java.programmer)
  • Re: Trouble importing foreign language accents into Access 2003
    ... I see no BOM at the top of either file. ... verify for the presence or the absence of a UTF-8 BOM (Byte ... Sylvain Lafontaine, ing. ... MVP - Windows Live Platform ...
    (microsoft.public.access.externaldata)
  • Re: find file containing text inside the file
    ... if BOM present at the start of the file, file considered UTF-8, ... First I tryed it on my folder with 300 PHP-files (don't worry, ...
    (microsoft.public.windows.vista.file_management)
  • Re: find file containing text inside the file
    ... if BOM present at the start of the file, file considered UTF-8, ... First I tryed it on my folder with 300 PHP-files (don't worry, ...
    (microsoft.public.windows.vista.file_management)