I've spent the past day attempting to extract a one XML node out of the following document and am unable to grasp the nuances of XML Namespaces to make it work.
The XML file is to large to post in total so here is the portion that concerns me:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<XFDL xmlns="http://www.PureEdge.com/XFDL/6.5" xmlns:custom="http://www.PureEdge.com/XFDL/Custom" xmlns:designer="http://www.PureEdge.com/Designer/6.1" xmlns:pecs="http://www.PureEdge.com/PECustomerService" xmlns:xfdl="http://www.PureEdge.com/XFDL/6.5">
<globalpage sid="global">
<global sid="global">
<xmlmodel xmlns:xforms="http://www.w3.org/2003/xforms">
<instances>
<xforms:instance id="metadata">
<form_metadata>
<metadataver version="1.0"/>
<metadataverdate>
<date day="05" month="Jul" year="2005"/>
</metadataverdate>
<title>
<documentnbr number="2062" prefix.army="DA" scope="army" suffix=""/>
<longtitle>HAND RECEIPT/ANNEX NUMBER </longtitle>
</title>
The document continues and is well formed all the way down. I am attempting to extract the "number" attribute from the "documentnbr" tag (three from the bottom).
The code that I'm using to do this looks like this:
/***
* Locates the Document Number information in the file and returns the form number.
* @return File's self-declared number.
* @throws InvalidFormException Thrown when XPath cannot find the "documentnbr" element in the file.
*/
public String getFormNumber() throws InvalidFormException
{
try{
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new XFDLNamespaceContext());
Node result = (Node)xPath.evaluate(QUERY_FORM_NUMBER, doc, XPathConstants.NODE);
if(result != null) {
return result.getNodeValue();
} else {
throw new InvalidFormException("Unable to identify form.");
}
} catch (XPathExpressionException err) {
throw new InvalidFormException("Unable to find form number in file.");
}
}
Where QUERY_FORM_NUMBER is my XPath expression, and XFDLNamespaceContext implements NamespaceContext and looks like this:
public class XFDLNamespaceContext implements NamespaceContext {
@Override
public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Invalid Namespace Prefix");
else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX))
return "http://www.PureEdge.com/XFDL/6.5";
else if ("custom".equals(prefix))
return "http://www.PureEdge.com/XFDL/Custom";
else if ("designer".equals(prefix))
return "http://www.PureEdge.com/Designer/6.1";
else if ("pecs".equals(prefix))
return "http://www.PureEdge.com/PECustomerService";
else if ("xfdl".equals(prefix))
return "http://www.PureEdge.com/XFDL/6.5";
else if ("xforms".equals(prefix))
return "http://www.w3.org/2003/xforms";
else
return XMLConstants.NULL_NS_URI;
}
@Override
public String getPrefix(String arg0) {
// TODO Auto-generated method stub
return null;
}
@Override
public Iterator getPrefixes(String arg0) {
// TODO Auto-generated method stub
return null;
}
}
I've tried many different XPath queries but I keep feeling like this should work:
protected static final String QUERY_FORM_NUMBER =
"/globalpage/global/xmlmodel/xforms:instances/instance" +
"/form_metadata/title/documentnbr[number]";
Unfortunately it does not work and I continually get a null return.
I've done a fair amount of reading here, here, and here, but nothing has proved sufficiently illuminating to help me get this working.
I'm almost positive that I'm going to face-palm when I figure this out but I'm really at wit's end as to what I'm missing.
Thank you for reading through all of this and thanks in advance for the help.
-Andy
SAX (alternative to XPath) version:
I see it's more complicated to use XPath with namespaces as it should be (my opinion). Here is my (simple) code:
You can get NamespaceContextMap class (not mine) from here (GPL license). There is also 6376058 bug.
Aha, I tried to debug your expression + got it to work. You missed a few things. This XPath expression should do it:
instance
withxforms:instance
then getNamespaceURI() gets called once withxforms
as the input argument, but the program throws an exception.@attr
, not[attr]
.My complete sample code:
Have a look at the XPathAPI library. It is a simpler way to use XPath without messing with the low-level Java API, especially when dealing with namespaces.
The code to get the
number
attribute would be:Namespaces are automatically extracted from the root node (
doc
in this case). In case you need to explicitly define additional namespaces you can use this:(Disclaimer: I'm the author of the library.)