How to extract img src, title and alt from html us

2018-12-31 03:24发布

I would like to create a page where all images which reside on my website are listed with title and alternative representation.

I already wrote me a little program to find and load all HTML files, but now I am stuck at how to extract src, title and alt from this HTML:

<img src="/image/fluffybunny.jpg" title="Harvey the bunny" alt="a cute little fluffy bunny" />

I guess this should be done with some regex, but since the order of the tags may vary, and I need all of them, I don't really know how to parse this in an elegant way (I could do it the hard char by char way, but that's painful).

2楼-- · 2018-12-31 04:11
$content =  "<img src='' width='40' height='40'>";
$image   =  preg_match_all('~<img rel="imgbot" remote="(.*?)" width="(.*?)" height="(.*?)" linktext="(.*?)" linkhref="(.*?)" src="(.*?)" />~is', $content, $matches);
3楼-- · 2018-12-31 04:13

You may use simplehtmldom. Most of the jQuery selectors are supported in simplehtmldom. An example is given below

// Create DOM from URL or file
$html = file_get_html('');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 
4楼-- · 2018-12-31 04:16

I used preg_match to do it.

In my case, I had a string containing exactly one <img> tag (and no other markup) that I got from Wordpress and I was trying to get the src attribute so I could run it through timthumb.

// get the featured image
$image = get_the_post_thumbnail($photos[$i]->ID);

// get the src for that image
$pattern = '/src="([^"]*)"/';
preg_match($pattern, $image, $matches);
$src = $matches[1];

In the pattern to grab the title or the alt, you could simply use $pattern = '/title="([^"]*)"/'; to grab the title or $pattern = '/title="([^"]*)"/'; to grab the alt. Sadly, my regex isn't good enough to grab all three (alt/title/src) with one pass though.

5楼-- · 2018-12-31 04:19

Here is THE solution, in PHP:

Just download QueryPath, and then do as follows:

$doc= qp($myHtmlDoc);

foreach($doc->xpath('//img') as $img) {

   $src= $img->attr('src');
   $title= $img->attr('title');
   $alt= $img->attr('alt');


That's it, you're done !

6楼-- · 2018-12-31 04:20

Here's A PHP Function I hobbled together from all of the above info for a similar purpose, namely adjusting image tag width and length properties on the fly ... a bit clunky, perhaps, but seems to work dependably:

function ReSizeImagesInHTML($HTMLContent,$MaximumWidth,$MaximumHeight) {

// find image tags
preg_match_all('/<img[^>]+>/i',$HTMLContent, $rawimagearray,PREG_SET_ORDER); 

// put image tags in a simpler array
$imagearray = array();
for ($i = 0; $i < count($rawimagearray); $i++) {
    array_push($imagearray, $rawimagearray[$i][0]);

// put image attributes in another array
$imageinfo = array();
foreach($imagearray as $img_tag) {

    preg_match_all('/(src|width|height)=("[^"]*")/i',$img_tag, $imageinfo[$img_tag]);

// combine everything into one array
$AllImageInfo = array();
foreach($imagearray as $img_tag) {

    $ImageSource = str_replace('"', '', $imageinfo[$img_tag][2][0]);
    $OrignialWidth = str_replace('"', '', $imageinfo[$img_tag][2][1]);
    $OrignialHeight = str_replace('"', '', $imageinfo[$img_tag][2][2]);

    $NewWidth = $OrignialWidth; 
    $NewHeight = $OrignialHeight;
    $AdjustDimensions = "F";

    if($OrignialWidth > $MaximumWidth) { 
        $diff = $OrignialWidth-$MaximumHeight; 
        $percnt_reduced = (($diff/$OrignialWidth)*100); 
        $NewHeight = floor($OrignialHeight-(($percnt_reduced*$OrignialHeight)/100)); 
        $NewWidth = floor($OrignialWidth-$diff); 
        $AdjustDimensions = "T";

    if($OrignialHeight > $MaximumHeight) { 
        $diff = $OrignialHeight-$MaximumWidth; 
        $percnt_reduced = (($diff/$OrignialHeight)*100); 
        $NewWidth = floor($OrignialWidth-(($percnt_reduced*$OrignialWidth)/100)); 
        $NewHeight= floor($OrignialHeight-$diff); 
        $AdjustDimensions = "T";

    $thisImageInfo = array('OriginalImageTag' => $img_tag , 'ImageSource' => $ImageSource , 'OrignialWidth' => $OrignialWidth , 'OrignialHeight' => $OrignialHeight , 'NewWidth' => $NewWidth , 'NewHeight' => $NewHeight, 'AdjustDimensions' => $AdjustDimensions);
    array_push($AllImageInfo, $thisImageInfo);

// build array of before and after tags
$ImageBeforeAndAfter = array();
for ($i = 0; $i < count($AllImageInfo); $i++) {

    if($AllImageInfo[$i]['AdjustDimensions'] == "T") {
        $NewImageTag = str_ireplace('width="' . $AllImageInfo[$i]['OrignialWidth'] . '"', 'width="' . $AllImageInfo[$i]['NewWidth'] . '"', $AllImageInfo[$i]['OriginalImageTag']);
        $NewImageTag = str_ireplace('height="' . $AllImageInfo[$i]['OrignialHeight'] . '"', 'height="' . $AllImageInfo[$i]['NewHeight'] . '"', $NewImageTag);

        $thisImageBeforeAndAfter = array('OriginalImageTag' => $AllImageInfo[$i]['OriginalImageTag'] , 'NewImageTag' => $NewImageTag);
        array_push($ImageBeforeAndAfter, $thisImageBeforeAndAfter);

// execute search and replace
for ($i = 0; $i < count($ImageBeforeAndAfter); $i++) {
    $HTMLContent = str_ireplace($ImageBeforeAndAfter[$i]['OriginalImageTag'],$ImageBeforeAndAfter[$i]['NewImageTag'], $HTMLContent);

return $HTMLContent;

7楼-- · 2018-12-31 04:20

There is my solution for retriving only images from the content of any post in wordpress or html content. `

$content = get_the_content();
$count = substr_count($content, '<img');
$start = 0;
for ($i=0;$i<$count;$i++) {
  if ($i == 0){
    $imgBeg = strpos($content, '<img', $start);
    $post = substr($content, $imgBeg);
  } else {
    $imgBeg = strpos($post, '<img', $start);
    $post = substr($post, $imgBeg-2);
  $imgEnd = strpos($post, '>');
  $postOutput = substr($post, 0, $imgEnd+1);
  $postOutput = preg_replace('/width="([0-9]*)" height="([0-9]*)"/', '',$postOutput);
  $image[$i] = $postOutput;
  $start= $imgEnd + 1;


登录 后发表回答