I'm using XOM with the following sample data:
Element root = cleanDoc.getRootElement();
//find all the bold elements, as those mark institution and clinic.
Nodes nodes = root.query("//*");
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<title>Patient Information</title>
</head>
</html>
The following element returns many elements (from real data):
//*
but something like
//head
Returns nothing. If I run through the children of the root, the numbers seem to match up, and if I print the element name, everything seems to look correct.
I'm taking HTML, parsing it with tagsoup, and then building a XOM Document from the resulting string. What part of this could go so horribly wrong? I feel there's some weird encoding issue going on here, but I'm just not seeing it. Java Strings are Strings, right?