What's the best way to remove
tags from t

2019-02-22 02:40发布

The .NET web system I'm working on allows the end user to input HTML formatted text in some situations. In some of those places, we want to leave all the tags, but strip off any trailing break tags (but leave any breaks inside the body of the text.)

What's the best way to do this? (I can think of ways to do this, but I'm sure they're not the best.)

7条回答
淡お忘
2楼-- · 2019-02-22 03:05

I'm trying to ignore the ambiguity in your original question, and read it literally. Here is an extension method that overloads TrimEnd to take a string.

static class StringExtensions
{
    public static string TrimEnd(this string s, string remove)
    {
        if (s.EndsWith(remove))
        {
            return s.Substring(0, s.Length - remove.Length);
        }
        return s;
    }
}

Here are some tests to show that it works:

        Debug.Assert("abc".TrimEnd("<br>") == "abc");
        Debug.Assert("abc<br>".TrimEnd("<br>") == "abc");
        Debug.Assert("<br>abc".TrimEnd("<br>") == "<br>abc");

I want to point out that this solution is easier to read than regex, probably faster than regex (you should use a profiler, not speculation, if you're concerned about performance), and useful for removing other things from the ends of strings.

regex becomes more appropriate if your problem is more general than you stated (e.g., if you want to remove <BR> and </BR> and deal with trailing spaces or whatever.

查看更多
戒情不戒烟
3楼-- · 2019-02-22 03:06

Small change to bdukes code, which should be faster as it doesn't backtrack.

public static Regex regex = new Regex(
    @"(?:\<br[^>]*\>)*$",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
);
regex.Replace(text, string.Empty);
查看更多
我欲成王,谁敢阻挡
4楼-- · 2019-02-22 03:09

As @Mitch said,

//  using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Thu, Sep 25, 2008, 02:01:36 PM
///  Using Expresso Version: 2.1.2150, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  Match expression but don't capture it. [\<br\s*/?\>], any number of repetitions
///      \<br\s*/?\>
///          <
///          br
///          Whitespace, any number of repetitions
///          /, zero or one repetitions
///          >
///  End of line or string
///  
///  
/// </summary>
public static Regex regex = new Regex(
    @"(?:\<br\s*/?\>)*$",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );
regex.Replace(text, string.Empty);
查看更多
成全新的幸福
5楼-- · 2019-02-22 03:11

You can use a regex to find and remove the text with the regex match set to anchor at the end of the string.

查看更多
唯我独甜
6楼-- · 2019-02-22 03:15

you can use RegEx or check if the trailing string is a break and remove it

查看更多
小情绪 Triste *
7楼-- · 2019-02-22 03:20

I'm sure this isn't the best way either, but it should work unless you have trailing spaces or something.

while (myHtmlString.EndsWith("<br>"))
{
    myHtmlString = myHtmlString.SubString(0, myHtmlString.Length - 4);
}
查看更多
登录 后发表回答