DOM Manipulation with PHP

I would like to make a simple but non trivial manipulation of DOM Elements with PHP but I am lost.

Assume a page like Wikipedia where you have paragraphs and titles (<p>, <h2>). They are siblings. I would like to take both elements, in sequential order.

I have tried GetElementbyName but then you have no possibility to organize information. I have tried DOMXPath->query() but I found it really confusing.

Just parsing something like:

<html>
  <head></head>
  <body>
    <h2>Title1</h2>
    <p>Paragraph1</p>
    <p>Paragraph2</p>
    <h2>Title2</h2>
    <p>Paragraph3</p>
  </body>
</html>

into:

Title1
Paragraph1
Paragraph2
Title2
Paragraph3

With a few bits of HTML code I do not need between all.

Thank you. I hope question does not look like homework.

标签： php dom parsing

3条回答

冷血范

2楼-- · 2019-08-12 15:39

Try having a look at this library and corresponding project:

Simple HTML DOM

This allows you to open up an online webpage or a html page from filesystem and access its items via class names, tag names and IDs. If you are familiar with jQuery and its syntax you need no time in getting used to this library.

0人赞添加讨论(0) 举报

Deceive 欺骗

3楼-- · 2019-08-12 15:40

I have uased a few times simple html dom by S.C.Chen.

Perfect class for access dom elements.

Example:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>';

Check it out here. simplehtmldom

May help with future projects

0人赞添加讨论(0) 举报

迷人小祖宗

4楼-- · 2019-08-12 16:02

I think DOMXPath->query() is the right approach. This XPath expression will return all nodes that are either a <h2> or a <p> on the same level (since you said they were siblings).

/html/body/*[name() = 'p' or name() = 'h2']

The nodes will be returned as a node list in the right order (document order). You can then construct a foreach loop over the result.

0人赞添加讨论(0) 举报

DOM Manipulation with PHP

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间