C# - Remove spaces in HTML source in between marku

2020-06-04 12:14发布

I am currently working on a program that allows me to enter HTML source code into a RichTextBox control and removes the spaces from in between markups. The only problem is, I am not sure how I can differentiate between the spaces BETWEEN the markups and the spaces INSIDE the markups. Obviously, removing the spaces inside the markups would be bad. Any ideas as to how I can tell the difference?

Example: (before white space is removed)

<p>blahblahblah</p>                  <p>blahblahblah</p>

Example: (after white space is removed)

<p>blahblahblah</p><p>blahblahblah</p>

7条回答
别忘想泡老子
2楼-- · 2020-06-04 12:48

I'm using the following. Off the top of my head, it's shortcomings are not handling brackets inside HTML comments and inside CDATA. Are there any other angle brackets in HTML that don't signify tags?

public static class HtmlHelper
{
    // positive look behind for ">", one or more whitespace (non-greedy), positive lookahead for "<"
    private static readonly Regex InsignificantHtmlWhitespace = new Regex(@"(?<=>)\s+?(?=<)");

    // Known not to handle HTML comments or CDATA correctly, which we don't use.
    public static string RemoveInsignificantHtmlWhiteSpace(string html)
    {
        return InsignificantHtmlWhitespace.Replace(html, String.Empty).Trim();
    }
}
查看更多
登录 后发表回答