getting src element using domDocument

2019-05-30 17:49发布

问题:

I am using domDocument. I am close but need help for the last little bit

I have this html just a snippet below. There are a number of rows. I am trying to get the href.

so far i am doing the following: I can get the table, tr, and td ok , but not sure what to do from there.

Thanks for any help

foreach ($dom->getElementsByTagName('table') as $tableitem) {
    if ( $tableitem->getAttribute('class') == 'tableStyle02'){
        $rows = $tableitem->getElementsByTagName('tr');
        foreach ($rows as $row){ 
            $cols = $row->getElementsByTagName('td'); 

            $hrefs = $cols->item(0)->getElementsByTagName('a'); 
        }     
    }
}

html snippet:

<table width="100%" border="0" cellspacing="0" cellpadding="2" class="tableStyle02"> 
    <tr> 
        <td><span class="Name"><a href="bin.php?cid=703&size=0">
               <strong>Conference Facility</strong></a></span></td>
        <td align="center" nowrap>0.00</td>
        <td align="center">&nbsp;0&nbsp;</td>
        <td align="center">&nbsp;&nbsp;</td>
        <td align="center">&nbsp;0&nbsp;</td>
        <td align="center">&nbsp;0&nbsp;</td>
        <td align="center">&nbsp;0 - 0 &nbsp;</td>
        <td align="center">&nbsp;Wired Internet,&nbsp;&nbsp;&nbsp;</td>
        <td align="center">&nbsp;&nbsp;</td>
    </tr>

回答1:

Let me introduce you the concept of xpath, a query language for DomDocuments:

//table[@class="tableStyle02"]//a/@href

Reads as: Take the table tag with class attribute tableStyle02 and then the href attribute from within the a child tag.

Or as you had the foreach for tr and td elements as well:

//table[@class="tableStyle02"]/tr/td/a/@href

Now in that path, the a tag is a direct children of the td tag which is a direct children of the tr tag which is a direct children of the table tag. As you can see, with xpath it is much easier to formulate the path to the element than writing everything in PHP code.

Apropos PHP code, in PHP this can look like:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xp = new DOMXPath($doc);
$href = $xp->evaluate('string(//table[@class="tableStyle02"]//a/@href)');

The variable $href then contains the string: bin.php?cid=703&size=0.


This example is with a string (string(...)), so ->evaluate returns a string, which is created from the first found attribute node. Instead you can return a nodelist as well:

$hrefs = $xp->query('//table[@class="tableStyle02"]/tr/td/span/a/@href');
#             ^^^^^                                       ^^^^

Now $hrefs contains the usual DOMNodeList, here it contains all the href attribute nodes:

echo $hrefs->item(0)->nodeValue; # bin.php?cid=703&size=0

Take care that if you use only one slash / to separate tags, that they need to be direct children. With two slashes // it can be a descendant (child or child of child (of child (of ...))).



回答2:

You should be able to use getAttribute() on the individual DOMElement instances, (just as you used it the second line of the example):

foreach ($hrefs as $a_node) {
    if ($a_node->hasAttribute('href')) {
        print $a_node->getAttribute('href');
    }
}


回答3:

You don't have to navigate your way down the DOM hierarchy to use getElementsByTagName:

foreach ($dom->getElementsByTagName('table') as $tableitem) {
    if ($tableitem->getAttribute('class') == 'tableStyle02'){
        $links = $tableitem->getElementsByTagName("a");
    }
}

$links at this point is now a DOMNodeList, so you can iterate through it:

foreach ($dom->getElementsByTagName('table') as $tableitem) {
    if ($tableitem->getAttribute('class') == 'tableStyle02'){
        $links = $tableitem->getElementsByTagName("a");
        $hrefs = array();
        foreach ($links as $link) {
            $hrefs[] = $link->getAttribute("href");
        }
    }
}
// Do things with $hrefs