Possible Duplicate:
Best methods to parse HTML with PHP
For a project I need to take a HTML page and extract all its text and img tags from it, and keep them in the same order they appear in the web page.
So for example, if the web page is:
<p>Hi</p>
<a href ="test.com" alt="a link"> text link</a>
<img src="test.png" />
<a href ="test.com"><img src="test2.png" /></a>
I would like to retrieve that information with this format:
text - Hi
Link1 - <a href ="test.com">text link</a> notice without alt or other tag
Img1 - test.png
Link2 - <a href ="test.com"><img src="test2.png" /></a> again no tag
Is there a way to make that in PHP?
I would use an HTML Parser to pull the information out of the website. Get reading.
Yes, you can first strip all tags you're not interested in and then use
DOMDocument
to remove all unwanted attributes. Finally you need to re-runstrip_tags
to remove tags added byDomDocument
:Demo