Validating SAX Parser


I’ve been working with the built in Java XML libraries quite a bit lately and one of the things I’ve noticed is that there are very few good snippets of what I call recipe code – little bits of code that show you how to complete a specific task. In this article I show you how to use JAXP 1.3/1.4 to load a schema from the classpath and create a validating SAX parser.

When it comes to validating XML documents you have an almost bewidering array of options. There are three popular schema languages (DTD, XML Schema and Relax NG) and two ways of specifying a schema for a document (in the document or manually in code). Then in code you have two (mostly) in compatible ways of setting a schema for validation.

MyHandler handler = new MyHandler(); 

SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);  
StreamSource source = new StreamSource(getClass().getResourceAsStream("/some/path/schema.xsd")); 
Schema schema = schemaFactory.newSchema(source); 

SAXParserFactory factory = SAXParserFactory.newInstance(); 
factory.setNamespaceAware(true); 
//factory.setValidating(true); 
factory.setSchema(schema); 

SAXParser saxParser = factory.newSAXParser(); 
saxParser.parse(new File( "somefile.xml", handler);

The (probably badly formatted) code above shows you how to create a validating SAX parser. The first line creates a new content handler which I’ll come to in a moment. The next three lines deal with creating a Schema object – a compiled schema that can be used to validate an XML document. In this case I’m loading the schema from the classpath. I’ve not seen this done anywhere else but I find it useful as it means the application always knows where to find the latest version of a schema.

Once the schema has been loaded create a SAX parser factory and then set the schema in it. All parsers created by this factory will then have a reference to the schema and be able to validate against is. It’s generally a good idea to make all parsers namespace aware as well unless you have a good reason not to. The final thing to note about the parser factory is that you don’t set it’s validating flag to true. This seems strange and it threw me for a while but if you set a schema and turn on validating you’ll get an error message like this:

Document is invalid: no grammar found.

As far as I can tell what is happening is the parser is checking the XML document for a link to a grammar which it’s not finding as we are setting the schema manually.

The last couple of lines just create a SAX parse and load the document. Coming back to the content handler though there is one more step necessary to ensure validation takes place. The content handler must override the following methods with something like the code below.

@Override
public void warning(SAXParseException e) throws SAXException {
    throw e;
}

@Override
public void error(SAXParseException e) throws SAXException {
    throw e;
}

@Override
public void fatalError(SAXParseException e) throws SAXException {
    throw e;
}

Most content handler you write will extend DefaultHandler which provides an implementation of these method which just swallows the SAXParseExceptions passed into each method. The result is that your application never receives notification of a parse failure and therefore it appears as if validation isn’t working.

Note: JAXP 1.4 is a maintenance release of 1.3.

References