Remove HTML formatting in Razor MVC 3

2019-04-29 01:50发布

I am using MVC 3 and Razor View engine.

What I am trying to do

I am making a blog using MVC 3, I want to remove all HTML formatting tags like <p> <b> <i> etc..

For which I am using the following code. (it does work)

 @{
 post.PostContent = post.PostContent.Replace("<p>", " ");   
 post.PostContent = post.PostContent.Replace("</p>", " ");
 post.PostContent = post.PostContent.Replace("<b>", " ");
 post.PostContent = post.PostContent.Replace("</b>", " ");
 post.PostContent = post.PostContent.Replace("<i>", " ");
 post.PostContent = post.PostContent.Replace("</i>", " ");
 }

I feel that there definitely has to be a better way to do this. Can anyone please guide me on this.

4条回答
爷、活的狠高调
2楼-- · 2019-04-29 02:08

Thanks Alex Yaroshevich,

Here is what I use now..

post.PostContent = Regex.Replace(post.PostContent, @"<[^>]*>", String.Empty);
查看更多
smile是对你的礼貌
3楼-- · 2019-04-29 02:10

The regular expression is slow. use this, it's faster:

public static string StripHtmlTagByCharArray(string htmlString)
{
    char[] array = new char[htmlString.Length];
    int arrayIndex = 0;
    bool inside = false;

    for (int i = 0; i < htmlString.Length; i++)
    {
        char let = htmlString[i];
        if (let == '<')
        {
            inside = true;
            continue;
        }
        if (let == '>')
        {
            inside = false;
            continue;
        }
        if (!inside)
        {
            array[arrayIndex] = let;
            arrayIndex++;
        }
    }
    return new string(array, 0, arrayIndex);
}

You can take a look at http://www.dotnetperls.com/remove-html-tags

查看更多
一纸荒年 Trace。
4楼-- · 2019-04-29 02:12

You can use regular expression.

This article might help you.

查看更多
别忘想泡老子
5楼-- · 2019-04-29 02:22

Just in case you want to use regex in .NET to strip the HTML tags, the following seems to work pretty well on the source code for this very page. It's better than some of the other answers on this page because it looks for actual HTML tags instead of blindly removing everything between < and >. Back in the BBS days, we typed <grin> a lot instead of :), so removing <grin> is not an option. :)

This solution only removes the tags. It does not remove the contents of those tags in situations where that might be important -- a script tag, for example. You'd see the script, but the script wouldn't execute because the script tag itself gets removed. Removing the contents of an HTML tag is VERY tricky, and practically requires that the HTML fragment be well formed...

Also note the RegexOption.Singleline option. That's very important for any block of HTML. as there's nothing wrong with opening an HTML tag on one line and closing it in another.

string strRegex = @"</{0,1}(!DOCTYPE|a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|big|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|header|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|main|map|mark|menu|menuitem|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr){1}(\s*/{0,1}>|\s+.*?/{0,1}>)";
Regex myRegex = new Regex(strRegex, RegexOptions.Singleline);
string strTargetString = @"<p>Hello, World</p>";
string strReplace = @"";

return myRegex.Replace(strTargetString, strReplace);

I'm not saying this is the best answer. It's just an option and it worked great for me.

查看更多
登录 后发表回答