XPath, JAXP and the Default Namespace

So you’ve just finished parsing a document with your validating namespace aware document factory and you’ve gone to select some nodes using XPath and nothing has appeared (no nodes where selected). You’ve tried changing the XPath expression, making it simpler and simpler and still nothing. Chances are it’s not he XPath that’s at fault it’s your XML.

To finish at the end as that’s always a good place to start the problem is that XPath is namespace aware and if you have a namespace defined in your XML document (including the default namespace) then all XPath expressions must include namespace prefixes or they won’t find any nodes. This is a real gotcha because the difference between a document with no namespace declared and the default namespace declared is just the declaration in the root node (or elsewhere I suppose) and it’s really easy to forget about when viewing pages of XML that has no prefixes.

Annoyingly this is also somewhat tiresome to deal with in JAXP but I’m getting ahead of myself, first I’d like to explain the problem in more detail.

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <example>Hello World</example>
</root>

So pretend you have the simple little XML document shown above. If you want to select the example node you could use the XPath expression //example. It’s perhaps not the best way to get to the node but it works. Now lets define a default namespace in the XML document as shown below.

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns="http://www.wobblycogs.co.uk/example">
    <example>Hello World</example>
</root>

If you try out the //example XPath expression now you’ll find it won’t return any nodes. Why? Simple, the XPath is asking for all nodes called example that aren’t in any namespace. In this document all the nodes are in the default namespace which is completely different to having no namespace even though the elements themselves look the same. If it helps to visualize things: in the first document there is no prefix, in the second document there is an invisible prefix that points to the URI “http://www.wobblycogs.co.uk/example”.

The fact the prefix is invisible makes life a little awkward because you can’t type an invisible prefix (my keyboard used to have an insible key but I can’t find it anymore) so you can’t use the default empty prefix in XPath – it would be impossible to tell the difference between no prefix and the default prefix. What you have to do is tell the XPath processor that a given prefix is what you are using to refer to nodes that are in the default prefix in the XPath expression. The glue that holds this together is the URI given in the documents default namespace declaration.

It works like this: you tell the XPath processor that when it sees the prefix “ns” in an XPath expression it maps to the URI “http://www.wobblycogs.co.uk/example”. The document you are querying has that same URI specified as the default namespace so now when you issue the XPath query: //ns:example the XPath processor knows to look for nodes called example in the default namespace.

Notice that what you’ve done is map from the empty default namespace to one with the name “ns”. This mapping isn’t limited to just the default namespace. If your document had another namespace defined, “foo” for example, you could create a mapping in the XPath processor to the prefix “bar”. This is handy when you are processing documents from multiple sources that don’t necessarily always use the same prefix.

In JAXP the way to provide this mapping to the XPath processor is by using a NamespaceContext which has three methods only one of which is of any real use: getNamespaceURI. This method maps from a prefix to a URI and is how the XPath processor links the XPath to the document namespaces. There’s no default implementation of NamespaceContext but I find the code below is generally enough.

package uk.co.wobblycogs.xml;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.namespace.NamespaceContext;
public class SimpleNamespaceContext implements NamespaceContext {
	private Map<String, String> prefixMap = new HashMap<>();
	public SimpleNamespaceContext() {}
	public SimpleNamespaceContext(String prefix, String uri) {
		prefixMap.put(prefix, uri);
	}
	public void addPrefixMapping(String prefix, String uri) {
		prefixMap.put(prefix, uri);
	}
	@Override
	public String getNamespaceURI(String prefix) {
		if (prefixMap.containsKey(prefix)) {
			return prefixMap.get(prefix);
		}
		return null;
	}
	@Override
	public String getPrefix(String namespaceURI) { return null;	}
	@Override
	public Iterator getPrefixes(String namespaceURI) { return null; }
}

The over all XPath query structure therefore looks like this:

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
xpath.setNamespaceContext(new SimpleNamespaceContext("ns", "http:/www.wobblycogs.co.uk/example"););
XPathExpression expr = xpath.compile(expression);
NodeList result = (NodeList) expr.evaluate(node, XPathConstants.NODESET);

The interesting part of the above code is where the namespace context is set in the XPath, obviously you could set additional mappings if you have more namespaces in your document.