I am using domDocument. I am close but need help for the last little bit
I have this html just a snippet below.
There are a number of rows. I am trying to get the href.
so far i am doing the following:
I can get the table, tr, and td ok , but not sure what to do from there.
Thanks for any help
foreach ($dom->getElementsByTagName('table') as $tableitem) {
if ( $tableitem->getAttribute('class') == 'tableStyle02'){
$rows = $tableitem->getElementsByTagName('tr');
foreach ($rows as $row){
$cols = $row->getElementsByTagName('td');
$hrefs = $cols->item(0)->getElementsByTagName('a');
}
}
}
html snippet:
<table width="100%" border="0" cellspacing="0" cellpadding="2" class="tableStyle02">
<tr>
<td><span class="Name"><a href="bin.php?cid=703&size=0">
<strong>Conference Facility</strong></a></span></td>
<td align="center" nowrap>0.00</td>
<td align="center"> 0 </td>
<td align="center"> </td>
<td align="center"> 0 </td>
<td align="center"> 0 </td>
<td align="center"> 0 - 0 </td>
<td align="center"> Wired Internet, </td>
<td align="center"> </td>
</tr>
Let me introduce you the concept of xpath, a query language for DomDocuments:
//table[@class="tableStyle02"]//a/@href
Reads as: Take the table tag with class attribute tableStyle02 and then the href attribute from within the a child tag.
Or as you had the foreach for tr
and td
elements as well:
//table[@class="tableStyle02"]/tr/td/a/@href
Now in that path, the a tag is a direct children of the td tag which is a direct children of the tr tag which is a direct children of the table tag. As you can see, with xpath it is much easier to formulate the path to the element than writing everything in PHP code.
Apropos PHP code, in PHP this can look like:
$doc = new DOMDocument();
$doc->loadHTML($html);
$xp = new DOMXPath($doc);
$href = $xp->evaluate('string(//table[@class="tableStyle02"]//a/@href)');
The variable $href
then contains the string: bin.php?cid=703&size=0
.
This example is with a string (string(...)
), so ->evaluate
returns a string, which is created from the first found attribute node. Instead you can return a nodelist as well:
$hrefs = $xp->query('//table[@class="tableStyle02"]/tr/td/span/a/@href');
# ^^^^^ ^^^^
Now $hrefs
contains the usual DOMNodeList
, here it contains all the href attribute nodes:
echo $hrefs->item(0)->nodeValue; # bin.php?cid=703&size=0
Take care that if you use only one slash /
to separate tags, that they need to be direct children. With two slashes //
it can be a descendant (child or child of child (of child (of ...))).
You should be able to use getAttribute() on the individual DOMElement instances, (just as you used it the second line of the example):
foreach ($hrefs as $a_node) {
if ($a_node->hasAttribute('href')) {
print $a_node->getAttribute('href');
}
}
You don't have to navigate your way down the DOM hierarchy to use getElementsByTagName
:
foreach ($dom->getElementsByTagName('table') as $tableitem) {
if ($tableitem->getAttribute('class') == 'tableStyle02'){
$links = $tableitem->getElementsByTagName("a");
}
}
$links
at this point is now a DOMNodeList
, so you can iterate through it:
foreach ($dom->getElementsByTagName('table') as $tableitem) {
if ($tableitem->getAttribute('class') == 'tableStyle02'){
$links = $tableitem->getElementsByTagName("a");
$hrefs = array();
foreach ($links as $link) {
$hrefs[] = $link->getAttribute("href");
}
}
}
// Do things with $hrefs