HTML DOM Document parsing

2019-07-27 14:02发布

i am new to DOM Document.. i have this html:

    <tr class="calendar_row" data-eventid="39657">
        <td class="alt1 eventDate smallfont" align="center">Sun<div class="eventday_multiple">Dec 9</div></td>
        <td class="alt1 smallfont" align="center">3:34am</td>
        <td class="alt1 smallfont" align="center">USD</td>
    </tr>

    <tr class="calendar_row" data-eventid="39658">
        <td class="alt1 eventDate smallfont" align="center">Sun<div class="eventday_multiple">Dec 10</div></td>
        <td class="alt1 smallfont" align="center">5:14am</td>
        <td class="alt1 smallfont" align="center">EUR</td>
    </tr>

i am trying to get first the contents inside the tr's using this code:

    $ret = array();
    libxml_use_internal_errors(true); 
    $doc = new DOMDocument();
    $doc->loadHTML($html);
    //$doc->saveHTMLFile('textbox.php');

    $text = $doc->getElementsByTagName('tr');
    foreach ($text as $tag){
        $ret[] = $doc->saveHtml($tag); 
        echo $doc->saveHtml($tag); 
    }

i dont know why the value being echoed was the whole document and not the values inside the tr's..

second, i would like also to get the values in between those td tags like 5:14 AM,EUR,etc. but i dont have any idea how to do that.

Pardon for noob question..

Best Regards

2条回答
爷的心禁止访问
2楼-- · 2019-07-27 15:03
$doc = new DOMDocument();
$doc ->loadHTML("$html");
$tables = $doc->getElementsByTagName('table');
$table = $tables->item(0);//takes the first table in dom

foreach ($table->childNodes as $td) {
  if ($td->nodeName == 'td') {
    echo $td->nodeValue, "\n";
  }
}
查看更多
Viruses.
3楼-- · 2019-07-27 15:03

Passing an element to saveHtml generates the elements outerHTML not its innerHTML, so you get its tag attributes and all its content. Of course you need to be running PHP>=5.3.6 .

The values between the td can be obtained by $td->firstChild->nodeValue; or just $td->textContent; where $td is the <td> in question.

查看更多
登录 后发表回答