I need to find all anchor tags, which have an img
tag as child element. Consider the following cases,
<a href="test1.php">
<img src="test1.jpg" alt="Test 1" />
</a>
<a href="test2.php">
<span>
<img src="test2.jpg" alt="Test 2" />
</span>
</a>
My requirement is to generate a list of href
attributes along with src
and alt
ie,
$output = array(
array(
'href' => 'test1.php',
'src' => 'test1.jpg',
'alt' => 'Test 1'
),
array(
'href' => 'test2.php',
'src' => 'test2.jpg',
'alt' => 'Test 2'
)
);
How can I match the above cases in PHP? (Using Dom Xpath or any other dom parser)
Thanks in Advance!
Assuming $doc
is a DOMDocument
representing your HTML document:
$output = array();
$xpath = new DOMXPath($doc);
# find each img inside a link
foreach ($xpath->query('//a[@href]//img') as $img) {
# find the link by going up til an <a> is found
# since we only found <img>s inside an <a>, this should always succeed
for ($link = $img; $link->tagName !== 'a'; $link = $link->parentNode);
$output[] = array(
'href' => $link->getAttribute('href'),
'src' => $img->getAttribute('src'),
'alt' => $img->getAttribute('alt'),
);
}
Assuming your HTML is a valid XML document (has a single root node, etc), you can use SimpleXML like this:
$xml = simplexml_load_file($filename);
$items = array();
foreach ($xml->xpath('//a[@href]') as $anchor) {
foreach ($anchor->xpath('.//img[@src][@alt]') as $img) {
$items[] = array(
'href' => (string) $anchor['href'],
'src' => (string) $img['src'],
'alt' => (string) $img['alt'],
);
}
}
print_r($items);
This uses xpath to search through the document for all <a>
tags that have an href
attribute. Then it searches under each <a>
tag found to find any <img>
tags that have both src
and alt
tags. It then just grabs the needed attributes and adds them to the array.
Use Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
You can do something like this (Rough Code, you will have to tune the code to get it to work. ):
//include simple html dom parser
$html = file_get_html('your html file here');
foreach($html->find('a') as $data){
$output[]['href']=$data->href;
$output[]['src']=$data->src;
$output[]['alt']=$data->alt;
}