I am using Simple html dom to scrape a website. The problem I have run into is that there is text positioned outside of any specific element. The only element it seems to be inside is <div id="content">
.
<div id="content">
<div class="image-wrap"></div>
<div class="gallery-container"></div>
<h3 class="name">Here is the Heading</h3>
All the text I want is located here !!!
<p> </p>
<div class="snapshot"></div>
</div>
I guess the webmaster has messed up and the text should actually be inside the <p>
tags.
I've tried using this code below, however it just won't retrieve the text:
$t = $scrape->find("div#content text",0);
if ($t != null){
$text = trim($t->plaintext);
}
I'm still a newbie and still learning. Can anyone help at all ?
You're almost there... Use a test loop to display the content of your nodes and locate the index of the wanted text. For example:
You'll find that you should use index
4
instead of0
:And if your text doesnt have always the same index but you know for example that it follows the
h3
heading, then you could use something like: