DOM Manipulation with PHP

2019-08-12 15:21发布

I would like to make a simple but non trivial manipulation of DOM Elements with PHP but I am lost.

Assume a page like Wikipedia where you have paragraphs and titles (<p>, <h2>). They are siblings. I would like to take both elements, in sequential order.

I have tried GetElementbyName but then you have no possibility to organize information. I have tried DOMXPath->query() but I found it really confusing.

Just parsing something like:

<html>
  <head></head>
  <body>
    <h2>Title1</h2>
    <p>Paragraph1</p>
    <p>Paragraph2</p>
    <h2>Title2</h2>
    <p>Paragraph3</p>
  </body>
</html>

into:

Title1
Paragraph1
Paragraph2
Title2
Paragraph3

With a few bits of HTML code I do not need between all.

Thank you. I hope question does not look like homework.

标签: php dom parsing
3条回答
冷血范
2楼-- · 2019-08-12 15:39

Try having a look at this library and corresponding project:

Simple HTML DOM

This allows you to open up an online webpage or a html page from filesystem and access its items via class names, tag names and IDs. If you are familiar with jQuery and its syntax you need no time in getting used to this library.

查看更多
Deceive 欺骗
3楼-- · 2019-08-12 15:40

I have uased a few times simple html dom by S.C.Chen.

Perfect class for access dom elements.

Example:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 

Check it out here. simplehtmldom

May help with future projects

查看更多
迷人小祖宗
4楼-- · 2019-08-12 16:02

I think DOMXPath->query() is the right approach. This XPath expression will return all nodes that are either a <h2> or a <p> on the same level (since you said they were siblings).

/html/body/*[name() = 'p' or name() = 'h2']

The nodes will be returned as a node list in the right order (document order). You can then construct a foreach loop over the result.

查看更多
登录 后发表回答