PHP DOM traverse HTML nodes and childnode

2019-06-05 15:32发布

I am using some code to pick out all the <td> tags from a HTML page:

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('td') as $node) {
$array_data[ ] = $node->nodeValue;
}

This stores the data fine in my array.

The html data being looked at is:

<tr>
<td>DATA 1</td>
<td><a href="12345">DATA 2</a></td>
<td>DATA 3</td> 
</tr>

The $array_data returns:

Array([0])=>DATA 1 [1]=>DATA 2 [2]=> DATA 3)

My desired output is to get code out of the <a> tag that is associated with the on the page. Desired output:

Array([0])=>DATA 1 [1]=>12345 [2]=>DATA 2 [3]=> DATA 3)

I think <a> would be called child node, I am very new to working with DOM sorry if this seems a stupid question.

I have read SO link: Using PHP dom to get child elements

I've used this code to pick out the href:

   foreach ($dom->getElementsByTagName('td') as $node) {
      foreach ($node->getElementsByTagName('a') as $node){
      $link = $node->getAttribute('href');
      echo '<br>';
      echo $link;
      }
      $array_data[ ] = $node->nodeValue;
   }

Any help or pointers for other reading material would be greatly appreicated!
Thanks

1条回答
【Aperson】
2楼-- · 2019-06-05 16:15

You should check td has a child. Select anchor tag using getElementsByTagName() and check the selection has content using length property. If the td has anchor in child, use getAttribute() to get href attribute of it.

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('td') as $node) {
    $nodeAnchor = $node->getElementsByTagName("a");
    if ($nodeAnchor->length)
        $array_data[] = $nodeAnchor->item(0)->getAttribute("href");
    $array_data[] = $node->nodeValue;
}

See demo

查看更多
登录 后发表回答