I would like to make a simple but non trivial manipulation of DOM Elements with PHP but I am lost.
Assume a page like Wikipedia where you have paragraphs and titles (<p>
, <h2>
). They are siblings. I would like to take both elements, in sequential order.
I have tried GetElementbyName
but then you have no possibility to organize information.
I have tried DOMXPath->query()
but I found it really confusing.
Just parsing something like:
<html>
<head></head>
<body>
<h2>Title1</h2>
<p>Paragraph1</p>
<p>Paragraph2</p>
<h2>Title2</h2>
<p>Paragraph3</p>
</body>
</html>
into:
Title1 Paragraph1 Paragraph2 Title2 Paragraph3
With a few bits of HTML code I do not need between all.
Thank you. I hope question does not look like homework.
Try having a look at this library and corresponding project:
Simple HTML DOM
This allows you to open up an online webpage or a html page from filesystem and access its items via class names, tag names and IDs. If you are familiar with jQuery and its syntax you need no time in getting used to this library.
I have uased a few times simple html dom by S.C.Chen.
Perfect class for access dom elements.
Example:
Check it out here. simplehtmldom
May help with future projects
I think
DOMXPath->query()
is the right approach. This XPath expression will return all nodes that are either a<h2>
or a<p>
on the same level (since you said they were siblings).The nodes will be returned as a node list in the right order (document order). You can then construct a foreach loop over the result.