Re: Want Input boxes to accept unicode strings on Standard Window

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



What does this mean exactly? My own parser, which seems to work fine for
my purposes (serialization), simply writes std::string to and from XML.
Why does it need to understand UTF-8?


Well, it was some kind of answer to your post complaining
that XML (as standard) "required these kinds of XML parsers"
and complained about size.
So my answer was that there is no need to be big, or to support
all the encodings in the world.

If your parser uses std::string, there is no problem.
Just make sure the std::string contains UTF-8, and you set the
encoding to UTF-8. And if you read an XML and the encoding
directive is not UTF-8, you fail. And you are compliant.
You can also read/write ANSI using std::string, as long as you
set the encoding to that ANSI code page, and you check it at
load time.

If you don't read UTf-16 and UTF-8 XML files, you parser is not
standard compliant, but you might not care, if you don't have
to interract with applications that are compliant.

But if your parser writes using ANSI on a Japanese system and
reads it as ANSI on a Russian one, ignoring the encoding directive,
then you have to care, because you corrupt your own data data.



--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
.



Relevant Pages

  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... If ther encoding is not specified, then the encoding is assumed to be ... Ah, UTF-8. ... That would be wrong according to the standard. ... when producing XML files. ...
    (microsoft.public.vc.mfc)
  • Re: tDOM doesnt support encoding=ASCII?
    ... a Tcl channel then Tcl will ... specifically asked for binary encoding), so any XML encoding declaration ... but when tdom sees it it is almost certainly UTF-8. ...
    (comp.lang.tcl)
  • Re: UTF-8 encoding problem
    ... Declaration having the "encoding" attribute at the begining of file ... What I am saying is the "encoding" of your physical file is different then the logical file (the xml itself). ... It sounds like your physical file is UTF-8, while I'm concerned your logical file is whatever, where whatever is the text you blindly copied from an MSDN article. ...
    (microsoft.public.dotnet.languages.vb)
  • Re: UTF-8 JavaScript files
    ... application of XML) has two default character encodings defined (that ... The default is not limited to UTF-8 and UTF-16LE. ... | encoding for its characters. ... | Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin ...
    (comp.lang.javascript)
  • How to find out a files encoding?
    ... I'm reading from a file that sometime is saved as ANSI and sometimes is ... Encoding ediEncoding = Encoding.GetEncoding; ... StreamReader streamReader = new StreamReader; ... But if it's saved as UTF-8 it have to be read with the default encoding. ...
    (microsoft.public.dotnet.languages.csharp)