Pretty Printing XML with JAXP

Java has a wide selection of built in XML handling capabilities but they are little used by most developers because they are felt, unfairly I think, to be difficult to use. I freely admit that I fell into that camp until fairly recently when I became a convert. Over the next few days I hope to writing some short articles giving hints and tips on using JAXP and the build in Java XML tools. First up is pretty printing.

Pretty printing XML is something that we all have to do from time to time but it’s not always obvious exactly how to achieve the desired output. A lot of the time I just splurge out XML and use the tools in Notepad++ to format it for me but when I’m writing a file that may be read by a human (other than me) I like to pretty print it on output. Accomplishing that with the JAXP libraries is done like this:

TransformerFactory factory = TransformerFactory.newInstance();
try {
	Transformer transformer = factory.newTransformer();
	transformer.setOutputProperty(OutputKeys.INDENT, "yes");
	transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");	DOMSource source = new DOMSource(doc);
	StreamResult result = new StreamResult(outputFile);
	transformer.transform(source, result);
} catch (TransformerException ex) {
	//Do something with the exception!
}

The key lines are 4 and 5, these turn on and configure indentation in the built in transformer. Since this is a non-standard option it is necessary to use the setOutputProperty method of the transformer to apply the configuration. If you aren’t sure whether the transformer supports these options surround those two lines in a try, catch block. Optionally you could also surround those lines with an if statement allowing you to switch pretty printing on or off depending on your audience.

Why would you want to switch pretty printing off? The simple answer is that pretty printing requires some CPU overhead and the documents are typically much larger. On a sample document I was producing the output was over twice the size when pretty printed.

The four in quotes on line six tells the system how many spaces to indent each level of the XML. Some people like two spaces but I find that too little to be useful.