Working with XML

I've been working with XML for a while now and I thought I'd share some of my experiences. This document will detail some best working practices relating to XML and the tools I use to help me work with it.

Editing


The first thing you need when working with XML is an editor. There's lots around, ranging from free tools like Notepad (for Windows), very powerful cross platform open source tools like Vi(m) and Emacs (*nix and Windows), JEdit (Java), through commercial text editors like TextPad, UltraEdit, EditPlus and very powerfully but equally expensive commercial tools like Stylus Studio or XMLSpy.

I tried a few of them with varying degrees of success, when I'm working on my Linux systems I use Vi, but most of my development work is done on Windows at Keynetix and I really needed a more complete solution to the process of developing XML rather than just editing files. I trialed XMLSpy with interesting results, working on small files it was great, the built in intellisense is a really useful tool and it's much more than just and editor, but I ran into problems when parsing and validating larges files including schema from many sources.

The one I went for in the end was JEdit.

There's something really clever about JEdit, it's plugin system, it's more than just an editor. On top of the base JEdit I have plugins for SideKick (show the document in a tree view), ErrorList (report errors during validation), XQuery (run XPath queries inside JEdit), XSLT (transform using XSLT stylesheets), XmlIndenter (automatically beautify XML documents), Xerces (for validating XML files using the Apache Xerces parser) and the XML plugin itself with loads of tools for editing and validating XML.

Validating and Parsing


What's this talk of "validating" and "parsing" I hear you say! Parsing is the process of taking a text file and ensuring it is an XML document, that is that it contains one root tag and zero, one or more nested tags inside that. If this is the case and all of the tags are closed properly and then the XML document is considered "well formed".
Validating is the process of taking an XML file and checking it's form and content against a Schema (or a DTD, but I don't tend to work with them so I won't cover them here) to make sure it adheres to the rules defined in that Schema. The real clever thing about XML is that Schema are themselves XML documents, so whilst writing a Schema you can validate them against the Schema Definition.

There's many ways to parse a file, the easiest way is to try and validate it! Since if it won't parse it definately won't validate! There are many validators (or Validating Processors) availiable and some of the more advanced editing suites (XMLSpy and Stylus Studio) even include their own.

Sadly all parsers are not equal! It's perfectly possible to write a valid document and have it be seen as valid in one parser, but not in another. This is down to the parsers "conformance level". I tried several different parsers and the one that I found worked the best was Xerces, the Apache project's implementation of an XML parser. It's reasonably easy to use, it's cross platform since it has a version written in Java and more often than not it's correct, from what I could find wherever there's been a "difference of opinion" between parsers, it's Xerces has come out as having had the correct implementation.

I've written a Short howto document on how to use Xerces-J as a validator, including the little batch file that REALLY makes life easier!

Summary


There are lots of XML editing and parsing tools availiable, making it easier and easier to write and process advanced XML data files that enfore data integrity inside the file, and give the file "intelligence" about it's content. Also how easy it is to check the validity of these files, there really is no reason for anyone to ever generate invalid XML. Or maybe more importantly there's no reason for anyone to accept invalid XML!

The tools are widely available to ensure the correct formatting and content of typed XML documents, in fact most of them are even free, JEdit and it's associated XML Plugins integrate with the Xerces validating parser to give a fully integrated XML development environment completely free of charge.