Looping through a table with Simple HTML DOM

I'm using Simple HTML DOM to extract data from a HTML document, and I have a couple of issues that I need some help with.

On the line that begins with if ($td->find('a')) I want to extract the href and the content of the anchor node separately, and place them in separate variables. The code however doesn't work (see output from echoes in the code below).

What is the best way to do this? Note that my purpose is to create a XML document out of the information later on, so I need the information in the correct order.

The links leads to pages containing detailed information about the different cars (e.g. "Max speed", "Price" etc) that I also want to extract and put into separate variables. How can I get hold of data on these pages?

<?php
include 'simple_html_dom.php';

$html = new simple_html_dom();
$html = file_get_html('http://www.example.com/foo.html');

$items = array();

foreach ($html->find('table') as $table) {
    foreach ($table->find('tr') as $tr) {

        foreach ($tr->find('td') as $td) {

            if ($td->find('a')) {
                $link = $td->find('a.href');
                echo $link;  // empty

                $text = $td->find('a.text');
                echo $text; // Array
            }
            else {
                echo 'Name: ' . $td;
            }
        }
    }
}

The HTML document looks like this:

<div>
    <table>
        <tr>
            <td>
                <a href="car1.html" target="_blank">Car 1</a>
            </td>
            <td>
                Porsche
            </td>
        </tr>
        <tr>
            <td>
                <a href="car2.html" target="_blank">Car 2</a>
            </td>
            <td>
                Chrysler
            </td>
        </tr>
        ... and so on...

标签： php dom web-scraping html-table simple-html-dom

2条回答

干净又极端

2楼-- · 2019-07-21 12:55

'a.href' is the selector to look for an anchor tag with the CSS class href. Not to get the href attribute of the anchor tag. You can do that like this:

$link = $td->find('a', 0);
$href = $link->href;

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2019-07-21 12:56

Use $td->find('a', 0)->href and $td->find('a', 0)->innertext to access element attributes in the first case, and contents in the second. Also, if there might be multiple anchor to be found, use 0 as a safe guard to always get the first one.

0人赞添加讨论(0) 举报

Looping through a table with Simple HTML DOM

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间