I am trying to find all links in a div and then printing those links.
I am using the Simple HTML Dom to parse the HTML file. Here is what I have so far, please read the inline comments and let me know where I am going wrong.
include('simple_html_dom.php');
$html = file_get_html('tester.html');
$articles = array();
//find the div the div with the id abcde
foreach($html->find('#abcde') as $article) {
//find all a tags that have a href in the div abcde
foreach($article->find('a[href]') as $link){
//if the href contains singer then echo this link
if(strstr($link, 'singer')){
echo $link;
}
}
}
What currently happens is that the above takes a long time to load (never got it to finish). I printed what it was doing in each loop since it was too long to wait and I find that its going through things I don't need it to! This suggests my code is wrong.
The HTML is basically something like this:
<div id="abcde">
<!-- lots of html elements -->
<!-- lots of a tags -->
<a href="singer/tom" />
<img src="image..jpg" />
</a>
</div>
Thanks all for any help
The correct way to select a div (or whatever) by ID using that API is:
$html->find('div[id=abcde]');
Also, since IDs are supposed to be unique, the following should suffice:
//find all a tags that have a href in the div abcde
$article = $html->find('div[id=abcde]', 0);
foreach($article->find('a[href]') as $link){
//if the href contains singer then echo this link
if(strstr($link, 'singer')){
echo $link;
}
}
Why don't you use the built-in DOM extension instead?
<?php
$cont = file_get_contents("http://stackoverflow.com/") or die("1");
$doc = new DOMDocument();
@$doc->loadHTML($cont) or die("2");
$nodes = $doc->getElementsByTagName("a");
for ($i = 0; $i < $nodes->length; $i++) {
$el = $nodes->item($i);
if ($el->hasAttribute("href"))
echo "- {$el->getAttribute("href")}\n";
}
gives
... (lots of links before) ...
- http://careers.stackoverflow.com
- http://serverfault.com
- http://superuser.com
- http://meta.stackoverflow.com
- http://www.howtogeek.com
- http://doctype.com
- http://creativecommons.org/licenses/by-sa/2.5/
- http://www.peakinternet.com/business/hosting/colocation-dedicated#
- http://creativecommons.org/licenses/by-sa/2.5/
- http://blog.stackoverflow.com/2009/06/attribution-required/