How should parse with PHP (simple html dom/etc..) background and other images of webpage?
case 1: inline css
<div id="id100" style="background:url(/mycar1.jpg)"></div>
case 2: css inside html page
<div id="id100"></div>
<style type="text/css">
#id100{
background:url(/mycar1.jpg);
}
</style>
case 3: separate css file
<div id="id100" style="background:url(/mycar1.jpg);"></div>
external.css
#id100{
background:url(/mycar1.jpg);
}
case 4: image inside img tag
solution to case 4 as he appears in php simple html dom parser:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
Please help me to parse case 1,2,3.
If exist more cases please write them, with soltion if you can please.
Thanks
For Case 1:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Get the style attribute for the item
$style = $html->getElementById("id100")->getAttribute('style');
// $style = background:url(/mycar1.jpg)
// You would now need to put it into a css parser or do some regular expression magic to get the values you need.
For Case 2/3:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Get the Style element
$style = $html->find('head',0)->find('style');
// $style now contains an array of style elements within the head. You will need to work out using attribute selectors what whether an element has a src attribute, if it does download the external css file and parse (using a css parser), if it doesnt then pass the innertext to the css parser.
To extract <img>
from the page you can try something like:
$doc = new DOMDocument();
$doc->loadHTML("<html><body>Foo<br><img src=\"bar.jpg\" title=\"Foo bar\" alt=\"alt\"></body></html>");
$xml = simplexml_import_dom($doc);
$images = $xml->xpath('//img');
foreach ($images as $img)
echo $img['src'] . ' ' . $img['alt'] . ' ' . $img['title'];
See doc for DOMDocument for more details.