I'm using Symfony, Goutte, and DOMCrawler to scrape a page. Unfortunately, this page has many old fashioned tables of data, and no IDs or classes or identifying factors. So I'm trying to find a table by parsing through the source code I get back from the request, but I can't seem to access any information
I think when I try to filter it, it only filters the first node, and that's not where my desired data is, so it returns nothing.
so I have a $crawler
object. And I've tried to loop through the following to get what I want:
$title = $crawler->filterXPath('//td[. = "Title"]/following-sibling::td[1]')->each(funtion (Crawler $node, $i) {
return $node->text();
});
I'm not sure what Crawler $node
, I just got it from the example on the web page. Perhaps if I can get this working, then it will loop through each node in the $crawler
object and find what I'm actually looking for.
Here's an example of the page:
<table>
<tr>
<td>Title</td>
<td>The Harsh Face of Mother Nature</td>
<td>The Harsh Face of Mother Nature</td>
</tr>
.
.
.
</table>
And this is just one table, there are many tables and a huge sloppy mess outside of this one. Any ideas?
(Note: earlier I was able to apply a filter to the $crawler
object for some information I needed, then I serialize()
the information, and has a string finally, which made sense. But I cannot get a string at all anymore, idk why.)