How to decode base64 tag

2019-09-17 18:19发布

问题:

I want to know if it's possible to do something like this:

`readfile(base64_decode_only_img_src_tags("mypage.html"));

I've been looking for a solution but without results. The idea is to change the lines encoded of an html file to his decoded line, for example:

<img src="data:image/png;base64,**iVBORw0KGgoAAAANSUhEUgAAABAAAAAQAQMAAAAlPW0iAAAABlBMVEW/v7////+Zw/90AAAAEUlEQVQI12P4z8CAFWEX/Q8Afr8P8erzE9cAAAAASUVORK5CYII=**">

To:

<img src="/path/to/images/image.gif">

I know maybe I should parse the code to detect the lines with img src tag and then decode the ** marked part of these lines **, but I don't know how to do it during the readfile.

Thanks in advance.

Just as @mario said, I'm testing his code:

 $newhtml = file_get_contents('newhtml.html');

function data_to_img($match) {
    list(, $img, $type, $base64, $end) = $match;
    $bin = base64_decode($base64);
    $md5 = md5($bin);   // generate a new temporary filename
    $fn = "$md5.$type";
    file_exists($fn) or file_put_contents($fn, $bin);

    return "$img$fn$end";  // new <img> tag
}

If I try to echo:

 echo preg_replace_callbak('#(<img[^>]+src=")data:image/(gif|png|jpeg);base64,([\w=+/]+)("[^>]*>)#', "data_to_img", $content);

And it worked with the html example above!! Now I'm trying with my real html file. I noticed that the img src are longer than the example that I've provided. Real example of img src that I have is too long to paste here, so please second button mouse click on the dog image and image information to see the base64 code. Thanks a lot!!

html file with base64 images

UPDATE: Hope this guy had the same problem with base64 large codification and regex

Link to the similiar problem

UPDATE2: Mario solved my problem, thank you very much man. Here's the code and regex for preg_replace_callback:

echo preg_replace_callback('#(<img\s(?>(?!src=)[^>])*?src=")data:image/(gif|png|jpeg);base64,([\w=+/]++)("[^>]*>)#', "data_to_img", $content);

回答1:

You could do that. But it kind of defeats the purpose, and you would have to take care not to unpack images twice into the temporary directory (which this would imply).

echo preg_replace_callback('#(<img\s(?>(?!src=)[^>])*?src=")data:image/(gif|png|jpeg);base64,([\w=+/]++)("[^>]*>)#', "data_to_img", $content);

function data_to_img($match) {
    list(, $img, $type, $base64, $end) = $match;

    $bin = base64_decode($base64);
    $md5 = md5($bin);   // generate a new temporary filename
    $fn = "tmp/img/$md5.$type";
    file_exists($fn) or file_put_contents($fn, $bin);

    return "$img$fn$end";  // new <img> tag
}

(I've ignored the invalid ** markup here.)

In particular you can't combine that with readfile, as you need to capture the file contents yourself to rewrite it. And then it's still a task that should be applied beforehand, not ad-hoc on each request.



回答2:

load the readfile result into a variable and use this Regex

data:image/png;base64,\*\*(.+?)\*\*