I have this xPath expression that I'm putting into htmlCleaner:
//table[@class='StandardTable']/tbody/tr[position()>1]/td[2]/a/img
Now, my issue is that it changes, and some times the /a/img element is not present. So I would like an expression that gets all elements
//table[@class='StandardTable']/tbody/tr[position()>1]/td[2]/a/img
when /a/img is present, and
//table[@class='StandardTable']/tbody/tr[position()>1]/td[2]
when /a/img is not present.
Does anyone hav any idea how to do this? I found in another question something that looks like it might help me
descendant-or-self::*[self::body or self::span/parent::body]
but I don't understand it.
Thanks in advanced.
You can select the union of two mutually exclusive expressions (notice the |
union operator):
//table[@class='StandardTable']/tbody/tr[position()>1]/td[2]/a/img|
//table[@class='StandardTable']/tbody/tr[position()>1]/td[2][not(a/img)]
When the first expression returns nodes, the second one will not (and the other way around), which means you'll always get just the required nodes.
From your comments on @Dimitre's answer, I see that HTMLCleaner doesn't fully support XPath 1.0. You don't really need it to. You just need HTMLCleaner to parse input that isn't well-formed. Once it has done that job, convert its output into a standard org.w3c.dom.Document
and treat it as XML.
Here's a conversion example:
TagNode tagNode = new HtmlCleaner().clean("<html><div><p>test");
Document doc = new DomSerializer(new CleanerProperties()).createDOM(tagNode);
From here on out, just use JAXP with whatever implementation you want:
XPath xpath = XPathFactory.newInstance().newXPath();
Node node = (Node) xpath.evaluate("/html/body/div/p[not(child::*)]",
doc, XPathConstants.NODE);
System.out.println(node.getTextContent());
Output:
test
Use:
(//table[@class='StandardTable']
/tbody/tr)
[position()>1]
/td[2]
[not(a/img)]
|
(//table[@class='StandardTable']
/tbody/tr)
[position()>1]
/td[2]
/a/img
In general, if we want to select one node-set ($ns1
) when some condition $cond
is true and to select another node-set ($ns2
) otherwise, this can be specified with the following single XPath expression:
$ns1[$cond] | $ns2[not($cond)]
In this particular case, ns1
is:
(//table[@class='StandardTable']
/tbody/tr)
[position()>1]
/td[2]
/a/img
and ns2
is:
(//table[@class='StandardTable']
/tbody/tr)
[position()>1]
/td[2]
And $cond
is:
boolean( (//table[@class='StandardTable']
/tbody/tr)
[position()>1]
/td[2]
/a/img
)
This is ugly and it may not even work, but the principle should:
//table[@class='StandardTable']/tbody/tr[position()>1]/td[2][exists( /a/img )]/a/img | //table[@class='StandardTable']/tbody/tr[position()>1]/td[2][not( exists( /a/img ) )]