I found similar questions and answers for Python and Javascript, but not for C# or any other WinRT compatible language.

The reason I think I need it, is because I'm displaying text I get from websites in a Windows 8 store app. E.g. é should become é.

Or is there a better way? I'm not displaying websites or rss feeds, but just a list of websites and their titles.

Answer 1:

我建议使用System.Net.WebUtility.HtmlDecode和NOT HttpUtility.HtmlDecode 。

这是由于事实System.Web参考不的WinForms / WPF /控制台应用程序存在，你可以使用这个类（已经被添加作为在所有这些项目的引用）完全相同的结果。

用法：

string s =  System.Net.WebUtility.HtmlDecode("&eacute;"); // Returns é

Answer 2:

这可能是有用的，替换所有（针对至于我的要求去）与它们的Unicode等同实体。

    public string EntityToUnicode(string html) {
        var replacements = new Dictionary<string, string>();
        var regex = new Regex("(&[a-z]{2,5};)");
        foreach (Match match in regex.Matches(html)) {
            if (!replacements.ContainsKey(match.Value)) { 
                var unicode = HttpUtility.HtmlDecode(match.Value);
                if (unicode.Length == 1) {
                    replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
                }
            }
        }
        foreach (var replacement in replacements) {
            html = html.Replace(replacement.Key, replacement.Value);
        }
        return html;
    }

Answer 3:

使用HttpUtility.HtmlDecode() .Read MSDN上这里

decodedString = HttpUtility.HtmlDecode(myEncodedString)

Answer 4:

不同的编码/编码在Metro应用和WP8应用HTML实体和HTML数字。

随着Windows运行时Metro应用

{
    string inStr = "ó";
    string auxStr = System.Net.WebUtility.HtmlEncode(inStr);
    // auxStr == &#243;
    string outStr = System.Net.WebUtility.HtmlDecode(auxStr);
    // outStr == ó
    string outStr2 = System.Net.WebUtility.HtmlDecode("&oacute;");
    // outStr2 == ó
}

随着Windows Phone 8.0

{
    string inStr = "ó";
    string auxStr = System.Net.WebUtility.HtmlEncode(inStr);
    // auxStr == &#243;
    string outStr = System.Net.WebUtility.HtmlDecode(auxStr);
    // outStr == &#243;
    string outStr2 = System.Net.WebUtility.HtmlDecode("&oacute;");
    // outStr2 == ó
}

为了解决这个问题，在WP8，我已经实现了在表HTML ISO-8859-1参考调用之前System.Net.WebUtility.HtmlDecode()

Answer 5:

这为我工作，取代了常见的和Unicode实体。

private static readonly Regex HtmlEntityRegex = new Regex("&(#)?([a-zA-Z0-9]*);");

public static string HtmlDecode(this string html)
{
    if (html.IsNullOrEmpty()) return html;
    return HtmlEntityRegex.Replace(html, x => x.Groups[1].Value == "#"
        ? ((char)int.Parse(x.Groups[2].Value)).ToString()
        : HttpUtility.HtmlDecode(x.Groups[0].Value));
}

[Test]
[TestCase(null, null)]
[TestCase("", "")]
[TestCase("&#39;fark&#39;", "'fark'")]
[TestCase("&quot;fark&quot;", "\"fark\"")]
public void should_remove_html_entities(string html, string expected)
{
    html.HtmlDecode().ShouldEqual(expected);
}

Answer 6:

改进Zumey方法（我不能老是评论那里）。最大字符尺寸是在实体：＆惊叹号; （11）。在实体上情况也是可能的，当然。 A（源来自维基）

public string EntityToUnicode(string html) {
        var replacements = new Dictionary<string, string>();
        var regex = new Regex("(&[a-zA-Z]{2,11};)");
        foreach (Match match in regex.Matches(html)) {
            if (!replacements.ContainsKey(match.Value)) { 
                var unicode = HttpUtility.HtmlDecode(match.Value);
                if (unicode.Length == 1) {
                    replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
                }
            }
        }
        foreach (var replacement in replacements) {
            html = html.Replace(replacement.Key, replacement.Value);
        }
        return html;
    }

文章来源: Converting HTML entities to Unicode Characters in C#