I want to select just a class on its own called .date
For some reason, I cannot get this to work. If anyone knows what is wrong with my code, it would be much appreciated.
@$doc = new DOMDocument();
@$doc->loadHTML($html);
$xml = simplexml_import_dom($doc); // just to make xpath more simple
$images = $xml->xpath(\'//[@class=\"date\"]\');
foreach ($images as $img)
{
echo $img.\" \";
}
I want to write the canonical answer to this question because the answer above has a problem.
Our problem
The CSS selector:
.foo
will select any element that has the class foo.
How do you do this in XPath?
Although XPath is more powerful than CSS, XPath doesn\'t have a native equivalent of a CSS class selector. However, there is a solution.
The right way to do it
The equivalent selector in XPath is:
//*[contains(concat(\" \", normalize-space(@class), \" \"), \" foo \")]
The function normalize-space strips leading and trailing whitespace (and also replaces sequences of whitespace characters by a single space).
(In a more general sense) this is also the equivalent of the CSS selector:
*[class~=\"foo\"]
which will match any element whose class attribute value is a list of whitespace-separated values, one of which is exactly equal to foo.
A couple of obvious, but wrong ways to do it
The XPath selector:
//*[@class=\"foo\"]
doesn\'t work! because it won\'t match an element that has more than one class, for example
<div class=\"foo bar\">
It also won\'t match if there is any extra whitespace around the class name:
<div class=\" foo \">
The \'improved\' XPath selector
//*[contains(@class, \"foo\")]
doesn\'t work either! because it wrongly matches elements with the class foobar, for example
<div class=\"foobar\">
Credit goes to this fella, who was the earliest published solution to this problem that I found on the web:
http://dubinko.info/blog/2007/10/01/simple-parsing-of-space-seprated-attributes-in-xpathxslt/
//[@class=\"date\"]
is not a valid xpath.
Try //*[@class=\"date\"]
, or if you know it is an image, //img[@class=\"date\"]
XPath 3.1 introduces a function contains-token and thus finally solves this ‘officially’. It is designed to support classes.
Example:
//*[contains-token(@class, \"foo\")]
This function makes sure that white space (not only
(U+0020)) is handled correctly, works in case of class name repetition, and generally covers the edge cases.
Note: As of today (2016-12-13) XPath 3.1 has status of Candidate Recommendation.
In XPath 2.0 you can:
//*[count(index-of(tokenize(@class, \'\\s+\' ), \'foo\')) = 1]
as stated by Christian Weiske in:
https://cweiske.de/tagebuch/XPath%3A%20Select%20element%20by%20class.htm
HTML allows case-insensitive element and attribute names and then class is a space separated list of class-names. Here we go for a img
tag and the class
named date
:
//*[\'IMG\' = translate(name(.), \'abcdefghijklmnopqrstuvwxyz\', \'ABCDEFGHIJKLMNOPQRSTUVWXYZ\')]/@*[\'CLASS\' = translate(name(.), \'abcdefghijklmnopqrstuvwxyz\', \'ABCDEFGHIJKLMNOPQRSTUVWXYZ\') and contains(concat(\' \', normalize-space(.), \' \'), concat(\' \', \'date\', \' \'))]
See as well: CSS Selector to XPath conversion
BEWARE OF MINUS SIGNS IN TEMPLATE !!! If you are querying for \"my-ownclass\" in DOM:
<ul class=\"my-ownclass\"><li>...</li></ul>
<ul class=\"someother\"><li>...</li></ul>
<ul><li>...</li></ul>
$finder = new DomXPath($dom);
$nodes = $finder->query(\".//ul[contains(@class, \'my-ownclass\')]\"); // This will NOT behave as expected! This will strangely match all the <ul> elements in DOM.
$nodes = $finder->query(\".//ul[contains(@class, \'ownclass\')]\"); // This will match the element.