how to scrape this with Simple HTML DOM [closed]

2020-04-01 06:58发布

问题:

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 7 years ago.

I'm trying to use simple html dom to extract elements from a file that looks like this.

  • The file has several tables that look the same class=sometable.
  • Each table has a few <tr class=sometr>.
  • Then inside each tr, I have th that has the title, and a td that has a category.

What I want to extract is all titles class=title and their corresponding category number class=category for all table rows in all tables. I've loaded the file in $html. Can someone tell me what I'm supposed to find after that? I've tried even $collection = $html->find('tr'); and did a vardump on the collection but got nothing, so it looks like I'm not selecting right.

<table class="sometable">
  <tbody>
    <tr class="sometr">
      <th><a class="title">Table 1 Title1</a></th>
      <td class="category" id="categ-113"></td>
      <td class="somename">Table 1 Title 1 name</td>
    </tr>
    <tr></tr>
    <tr></tr>                           
  </tbody>
</table>

<table class="sometable">
</table>

<table class="sometable">
</table>

回答1:

I have tested this and it works

$tables = $dom->find('table[@class="sometable"]');

foreach($tables as $table)
{
    $titles = $table->find('a[@class="title"]');
    foreach($titles as $title)
    {
        echo "Link title = " . $title ."<br />";
    }

    $categories = $table->find('td[@class="category"]');
    foreach($categories as $category)
    {
        echo "Category id = " . $category->id ."<br />";
    }

    $titles2 = $table->find('td[@class="somename"]');
    foreach($titles2 as $title2)
    {
        echo "Title2 = " . $title2 ."<br />";
    }

}