i'm using PHP Simple HTML DOM Parser to get text from a webpage. The page i need to manipulate is something like:
<html>
<head>
<title>title</title>
<body>
<div id="content">
<h1>HELLO</h1>
Hello, world!
</div>
</body>
</html>
I need to get the h1
element and the text that has no tags.
to get the h1
i use this code:
$html = file_get_html("remote_page.html");
foreach($html->find('#content') as $text){
echo "H1: ".$text->find('h1', 0)->plaintext;
}
But the other text? I also tried this into the foreach but i get the full text:
$text->plaintext;
but it returned also the H1
tag...
It looks like
$text->find('text',2);
gets what you're looking for, however I'm not sure how well that will work when the amount of text nodes is unknown. I'll keep looking.You can simply strip html tags using
strip_tags
Use strip tags, as @Peachy pointed out. However, passing it a second argument
<br>
means string will ignore<br>
tags, which is unnecessary. In your case,would work as you'd like, given that you are only selecting content in the
content
id.