Re: Why do you use XML?



On Feb 8, 8:23 am, "Jon Skeet [C# MVP]" <sk...@xxxxxxxxx> wrote:
On Feb 8, 3:15 pm, "jehugalea...@xxxxxxxxx" <jehugalea...@xxxxxxxxx>
wrote:

I think it's a good idea to use the right format for the job, but XML
has significant benefits in many places:

1) Reasonable character encoding support

How often is this a real concern? I assume you are talking about
Unicode, UTF8, etc. How hard is it to generate a Unicode file? How
often can't you figure that out?

Often. You'd be amazed at how often people guess at encodings and get
it wrong.

I suppose. I did recently misinterpret an ASCII file for Unicode and
had all sorts of trouble. Thank goodness StreamReader is smarter than
I am.


2) No need to manually escape and unescape data (did your SSV file have
   no values with spaces in?)

SSV files don't care where the spaces are. The position and length of
fields are static.

That doesn't sound like a space-separated value file. That sounds like
a fixed record file to me. Different kettle of fish, IMO.

Indeed you are right. The standard method for handling embedding
spaces (or even commas in CSVs) is to use quotes or some other
character. Again, Regex handles this very well.


XML doesn't need to limit the size of a field,
however, neither does CSV. A quick regular expression can break a CSV
file apart in no time (it is slower though).

Except you've got to know exactly which flavour of CSV is being used.

It is just the same as not knowing which XML format you are using. XML
can represent the same data in many/unlimited? different ways. If code
is written correctly, the amount of code aware of the data source is
minimal. Unless you are working with an extremely deleveloper-
unfriendly format, I say it is just as easy to pull data out no matter
what. The only benefit to XML is that someone already did most of the
parsing work for you, via XPath for example. However, writing a custom
parser can be as simple as a regex expression, so why force your input
into XML when SSV or CSV is readily available?


3) Natural way of representing hierarchies, or multiple tables etc -
   SSV/CSV really only describes a single rectangular table easily

I agree this is a good use of XML. However, a long time ago, you
handled heirarchies with a relational schema and had the entities in
separate files, searching on index(which was the line number usually),
which was typically fast and easy to code to.

Searching on line numbers only works with fixed length records (which
has limitations) *and* if you're using a fixed with encoding - so
UCS-2 is okay, but UTF-8 isn't, for example. It also means having to
manage more files.

They don't "have" to be fixed-length records. There just needs to be
an agreement that there is one record per line. The additional benefit
is that you aren't forced to send a 20MB file if all your user wants
are the child records, which is a common scenario.

I'm not arguing with you. I am just pointing out that it is a matter
of the environment, and so far what you are said doesn't intice me to
rely on XML. I still don't see how XML can benefit me *here*.


4) Reasonably self-descriptive, with element and attribute names

I agree about this too. However, again, most other formats support a
header line that has the name of the column. Usually that is all that
is needed. Using my example above, attributes translate to values and
child elements to relations. It can be equally represented.

When you say "usually" - out of what range of uses? Most of my data
sources aren't represented in just a single table form - so I've
either got to have multiple files, or use a richer format such as XML.

In my environment, we work primarily with CSV and SSV. Relationships
are typically formed on the database or are fixed. Again, XML repeats
the data definition every time. I would call this the biggest waste
generator when using XML. It's just not that practical in my
environment.


5) Well-defined standard - if you give me an XML file, I need to
   ask you enough information to *understand* the data, but I don't
   need more information in order to *load* it. Compare that with
   finding out which particular brand of SSV/CSV/etc you're using

This is another good point. Again, even XML formats are typically
application specific, not global such as XHTML. To some degree the
developer must always know what he is working with. In my opinion, it
takes as long to understand a SSV as it does to understand an XML. DTD
surely helps and having the nesting direclty in the file is also
beneficial.

There's a difference between understanding the syntax and
understanding the semantics. If it's an application-specific file,
you'll have to have some description of the semantics whatever format
you use - but with XML you don't need to have anything extra to
describe the syntax (beyond the option of using a DTD, which the
developer doesn't need to understand - they just give to the XML
parser).

You make it sound like your parser "knows" what to do with what it
extracts. What does it extract, where does it go? How does it
magically make your code know what to do? At some point you the
developer must know what the data *means*. You need to know where to
find the data. You need to know where to put it. That requires knowing
the syntax *and* the symantics. Unless you have found a way to
overcome this need for developer intervention? Did ya? 'Cause that
would be something I would like to have in my possession!


But what experiences have you found XML useful in. I mostly concerned
with knowing a good time to use XML. Like I said, I hear the hype all
the time, but have rarely found a practical application of it.

I use XML all the time, in many, many situations. Persistence,
serialization, project files for Visual Studio, configuration files
for applications, RSS feeds... whereas I can't remember the last time
I had to write code to deal with a fixed size record file.

Jon

But if you think about it, all of these applications of XML are due to
someone deciding to jump on the XML train. They didn't have to use
XML. What about the "in-between". Anyone can send a format over a
network. Does XML provide a benefit that couldn't have been achieved
otherwise? or do these applications use XML because it makes their
code seem more up-to-date? or do they use XML because of the pre-built
tools available? I would venture that someone could create a CSV
standard document format and the tools to work with it that could
rival any XML-based platform. Create the tools and you will have
people who use your format. Especially if you can convince them that
it is "the way" to do data tranfer.

When was the last time you had to create your own XML format and use
it in your own code? Are we just working with tools that make our
lives easier, and it is just coincidental that they used XML? I can't
say. I feel as though there is some reason for all the hype. I just
wish I knew how to utilize it.
.



Relevant Pages

  • Re: Why do you use XML?
    ... Sometimes you wrap all strings in quotes, sometimes you only wrap strings in quotes if they contain a comma. ... Our CSV output routines have a bajillion configuration options to accommodate all of the different systems we output to in this "standard" CSV format. ... The only benefit to XML is that someone already did most of the ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Why do you use XML?
    ... different flavours of CSV. ... It is just the same as not knowing which XML format you are using. ... An XML parser doesn't try to understand the data, ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Why do you use XML?
    ... different flavours of CSV. ... Sometimes you wrap all strings in quotes, ... we output to in this "standard" CSV format. ... The only benefit to XML is that someone already did most of the ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Why do you use XML?
    ... different flavours of CSV. ... XML has a standard. ... I believe you're confusing synactic format with semantic format. ... An XML parser doesn't try to understand the data, ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Sane Syntax
    ... vital role in the future of TeX but we need some more human friendly ... Generating well formed LaTeX2e documents from XML ... Another approach is to convert existing documents to XML format and go ... TEI, together with DocBook, are the two ...
    (comp.text.tex)

Loading