Remove images from html the same as gmail would do

2019-05-26 03:56发布

I am writing a simple HTML email design editor in PHP and also show a demo of how this will look.

I think it would also be very useful to show the user how this will look in an email client such as gmail with images turned off.

What is my best approach for this? Anybody know how this is done in gmail/hotmail etc?

Do I simple remove img -> src and css background: url with a reg expression?

I would like to remove the background parts from: background="url" used in tables and background-image:url(url); used inline css

I found this question which has the same kind of idea, although I would like to actually remove the img and backrgound-images from the HTML text.

Or could this code be modified to work with background images also?

7条回答
该账号已被封号
2楼-- · 2019-05-26 04:39

Using regular expressions to parse html is usually not recommended.

I think a better approach would be to parse the html server-side, and manipulate it to remove the images or the image src attributes. A library I've had success with is http://simplehtmldom.sourceforge.net/, but I think you can use official PHP DOM extensions.

The removal of background images might be more tricky. You might have to use something like http://www.pelagodesign.com/sidecar/emogrifier/ to apply something like {background: none} to the html elements. However, CSS background images are not supported in the latest versions of Microsoft Outlook, so I would recommend not using them at all from the get-go in order to have the emails to be consistent for most email clients.

查看更多
Viruses.
3楼-- · 2019-05-26 04:42

You could always do this on the client end as well.

Using this hypothetical code, you should be able to do something like this, pretending that modern browsers all work the same: (or use jQuery or something)

var email;
var xhr = new XMLHttpRequest();
xhr.open('GET', URL_FOR_EMAIL, true);
xhr.onreadystatechange = function(event){
   if(xhr.readyState === 4 && xhr.status === 200){
        email = HTMLParser(xhr.responseText);
   }
}

var imgs = email.getElementsByTagName('img');
for(var i = 0; i > imgs.length; i++){
    email.removeChild(imgs[i]);
}

// attach the email body to the DOM
// do something with the images

HTMLParser from MDN

function HTMLParser(aHTMLString){
  var html = document.implementation.createDocument("http://www.w3.org/1999/xhtml", "html", null),
    body = document.createElementNS("http://www.w3.org/1999/xhtml", "body");
  html.documentElement.appendChild(body);

  body.appendChild(Components.classes["@mozilla.org/feed-unescapehtml;1"]
    .getService(Components.interfaces.nsIScriptableUnescapeHTML)
    .parseFragment(aHTMLString, false, null, body));

  return body;
},
查看更多
Melony?
4楼-- · 2019-05-26 04:42

I think that the best way to do it and keep the change reversible its using a tag who not process the "src" attribute.

Ex: Change all the "img" with "br"

So print the filtered HTML 1st and reverse it with ajax, search for all the br with a src attribute.

查看更多
我命由我不由天
5楼-- · 2019-05-26 04:49

Like tkone mentioned: perhaps JavaScript / jQuery is the answer.

This will look at all images in your preview area and change the source to a placeholder image. The 'placeholder' class sets the background image to the placeholder as well

jQuery

$("#previewArea img").each(function(){
  $(this).attr("src","placeholder.jpg");
  $(this).addClass("hideBG");
});

CSS

.hideBG{
  background: url("placeholder.jpg");
}

Not tested, but should work - depending on your setup and needs.

查看更多
该账号已被封号
6楼-- · 2019-05-26 04:49

I've asked a similar question (in solution, not actual problem): How to strip specific tags and specific attributes from a string? (Solution)

It's a server side library which cleans (and formats) HTML input according to predefined settings. Have it remove any src attributes and all background properties.

查看更多
beautiful°
7楼-- · 2019-05-26 04:50

I would also suggest using PHP DOM instead of regex, which are often inaccurate. Here is an example code you could use to strip all the img tags and all the background attributes from your string:

// ...loading the DOM
$dom = new DOMDocument();
@$dom->loadHTML($string);  // Using @ to hide any parse warning sometimes resulting from markup errors
$dom->preserveWhiteSpace = false;
// Here we strip all the img tags in the document
$images = $dom->getElementsByTagName('img');
$imgs = array();
foreach($images as $img) {
    $imgs[] = $img;
}
foreach($imgs as $img) {
    $img->parentNode->removeChild($img);
}
// This part strips all 'background' attribute in (all) the body tag(s)
$bodies = $dom->getElementsByTagName('body');
$bodybg = array();
foreach($bodies as $bg) {
    $bodybg[] = $bg;
}
foreach($bodybg as $bg) {
    $bg->removeAttribute('background');
}

$str = $dom->saveHTML();

I've selected the body tags instead of the table, as the <table> itself doesn't have a background attribute, it only has bgcolor. To strip the background inline css property, you can use the sabberworm's PHP CSS Parser to parse the CSS retrieved from the DOM: try this

// Selecting all the elements since each one could have a style attribute
$alltags = $dom->getElementsByTagName('*');
$tags = array();
foreach($alltags as $tag) {
    $tags[] = $tag;
} $css = array();
foreach($tags as &$tag) {
    $oParser = new CSSParser("p{".$tag->getAttribute('style')."}");
    $oCss = $oParser->parse();
    foreach($oCss->getAllRuleSets() as $oRuleSet) {
        $oRuleSet->removeRule('background');
        $oRuleSet->removeRule('background-image');
    }
    $css = $oCss->__toString();
    $css = substr_replace($css, '', 0, 3);
    $css = substr_replace($css, '', -2, 2);
    if($css)
        $tag->setAttribute('style', $css);
}

Using all this code togheter, for example if you have a

$string = '<!DOCTYPE html>
<html><body background="http://yo.ur/background/dot/com" etc="an attribute value">
<img src="http://your.pa/th/to/image"><img src="http://anoth.er/path/to/image">
<div style="background-image:url(http://inli.ne/css/background);border: 1px solid black">div content...</div>
<div style="background:url(http://inli.ne/css/background);border: 1px solid black">2nd div content...</div>
</body></html>';

The PHP will output

<!DOCTYPE html>
<html><body etc="an attribute value">
<div style="border: 1px solid black;">div content...</div>
<div style="border: 1px solid black;">2nd div content...</div>
</body></html>
查看更多
登录 后发表回答