Home » Uncategorized » An example of bad use of XML #rant

An example of bad use of XML #rant

XML – the eXtended Markup Language – was designed with the express intent to be able to organize data into closely-associated structures, and to easily access this structured data.

Used in the right way and in the proper contexts (some purposes really only need INI), XML can be a very versatile and powerful tool both as a data structuring language and as a protocol carrier.

The Document Object Model (DOM) specification provides an API for manipulating XML documents loaded for parsing, and implementations for nearly every language have been implemented.

So you may yet understand why when I came across the following snippet, I was horrified…

<key>Track ID</key><integer>1548</integer>
<key>Artist</key><string>Chris Vrenna</string>
<key>Composer</key><string>Chris Vrenna</string>
<key>Album</key><string>American McGee’s Alice</string>
<!— skipped some ….. —>

I extracted the above from my iTunes Music Library.xml file when I was looking into parsing it for a duplicates-finding and removing script. As you can see, each piece of data relies on its positional relation relative to another piece, rather than their structural relation. XML was MADE for this structuring, to make parsing it easier. Using positional relations becomes not just bad practice, but stupidity. I’m sorry, but it does.

Examine what happens when the above becomes structured the way XML is supposed to be used (note, the following is just one way of doing it right):

<track id=”key1548”>
<tid type=”integer”>1548</tid>
<Name type=”string”>Track04</Name>
<Artist type=”string”>Chris Vrenna</Artist>
<Composer type=”string”>Chris Vrenna</Composer>
<Album type=”string”>American McGee’s Alice</Album>
<!— skipped some ….. —>

At least two advantages arise from using the correct form:

1) Now we can use the DOM Level 1 method getElementById on each track.

var track = ituneslib.getElementById(“1548”);

Accessing it previously would have required iterating through the nodes sequentially until reaching the ID, and then accessing the next node to get the actual track data. This is the required code to access the data in the badly structured version:

var track;
for(key in ituneslib.getElementsByTagName(“key”)) {
if(key.firstChild.nodeValue == “1548”) {
track = key.nextSibling;

DOM parsers come with the search methods built in. Might as well use them!

2) each child node is named so that it can be accessed directly. Changeable properties of the node are now a “type” attribute, the value is the direct child.

var name = track.getElementsByTagName(“Name”)[0].nodeValue;
var artist = track.getElementsByTagName(“Artist”)[0].nodeValue;

Accessing it previously would have meant iterating over each “key” node until one with a child value of the right content was found; then accessing the next node (of changeable name) before accessing the value we want.

var name;
var artist;
for(key in track.childNodes) {
if(key.firstChild.nodeValue == “Name”) {
name = key.nextSibling.nodeValue;
} else if(key.nextSibling.nodeValue == “Artist”) {
artist = key.nextSibling.nodeValue;

I can only fathom that the reasons iTunes uses this sloppy structure are that

  • the entire Library gets loaded anyway when iTunes starts up into an in-memory database, so that the appropriate sorting can happen during the use of the application. It’s a one-time parse during the life of the process.
  • Apple do not want anyone else parsing their data. So they make it really annoying.

I still can’t believe the structure, as it is, is even documented in a DTD…!

I’m not sure if it would even be possible to write an XSLT to turn the bad XML into good XML…!

Posted in Uncategorized

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.