Removing html tags and content where tag content m

2019-07-25 05:15发布

I've extracted some html from GmailApp using .getBody() and would like to return some html which filters a specific tag and contents where the contents matches any value in an array (specifically links with certain text). Looking at this solution I figure the easiest way to do this would be to use Xml.parse() and filter the object but can't get beyond creating the XmlDocument.

For example, if:

var html = '<div>some text then <div><a href="http://example1.com">foo</a></div> and then <span>some <a href="http://example2.com">baa</a>,and finally <a href="http://example3.com">close</a></span></div>';

and

var linksToRemove = ['baa','foo'];

how could I return

var newHtml = '<div>some text then <div></div> and then <span>some ,and finally <a href="http://example3.com">close</a></span></div>';

using

var obj = Xml.parse(html, true);

I can get an object to process but it all falls apart from there (I did also consider just using .replace() but given the issues with matching with RegEx thought it best to avoid)

1条回答
Explosion°爆炸
2楼-- · 2019-07-25 05:51

Following suggestion opted to try using regex

var html = '<div>some text then <div><a href="http://example1.com">foo</a></div> and then <span>some <a href="http://example2.com">baa</a>,and finally <a href="http://example3.com">close</a></span></div>';

var linksToRemove = ['baa', 'foo'];
var newHtml = cleanBody(html, linksToRemove);

/**
 * Removes links from html text
 * @param {string} html The html to be cleaned.
 * @param {array} exclude The array of link text to remove.
 * @returns {string} Cleaned html.
 */
function cleanBody(html, exclude) {
    html = html.replace(/\r?\n|\r|\t/g, ''); // used to remove breaks and tabs
    var re = '<a\\b[^>]*>(' + exclude.join('|') + ')<\\/a>';
    return html.replace(new RegExp(re, 'ig'), "");
}

Test at http://jsfiddle.net/HdsPU/

查看更多
登录 后发表回答