I found similar questions and answers for Python and Javascript, but not for C# or any other WinRT compatible language.
The reason I think I need it, is because I'm displaying text I get from websites in a Windows 8 store app. E.g. é
should become é
.
Or is there a better way? I'm not displaying websites or rss feeds, but just a list of websites and their titles.
Answer 1:
我建议使用System.Net.WebUtility.HtmlDecode和NOT HttpUtility.HtmlDecode
。
这是由于事实System.Web
参考不的WinForms / WPF /控制台应用程序存在,你可以使用这个类(已经被添加作为在所有这些项目的引用)完全相同的结果。
用法:
string s = System.Net.WebUtility.HtmlDecode("é"); // Returns é
Answer 2:
这可能是有用的,替换所有(针对至于我的要求去)与它们的Unicode等同实体。
public string EntityToUnicode(string html) {
var replacements = new Dictionary<string, string>();
var regex = new Regex("(&[a-z]{2,5};)");
foreach (Match match in regex.Matches(html)) {
if (!replacements.ContainsKey(match.Value)) {
var unicode = HttpUtility.HtmlDecode(match.Value);
if (unicode.Length == 1) {
replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
}
}
}
foreach (var replacement in replacements) {
html = html.Replace(replacement.Key, replacement.Value);
}
return html;
}
Answer 3:
使用HttpUtility.HtmlDecode()
.Read MSDN上这里
decodedString = HttpUtility.HtmlDecode(myEncodedString)
Answer 4:
不同的编码/编码在Metro应用和WP8应用HTML实体和HTML数字。
随着Windows运行时Metro应用
{
string inStr = "ó";
string auxStr = System.Net.WebUtility.HtmlEncode(inStr);
// auxStr == ó
string outStr = System.Net.WebUtility.HtmlDecode(auxStr);
// outStr == ó
string outStr2 = System.Net.WebUtility.HtmlDecode("ó");
// outStr2 == ó
}
随着Windows Phone 8.0
{
string inStr = "ó";
string auxStr = System.Net.WebUtility.HtmlEncode(inStr);
// auxStr == ó
string outStr = System.Net.WebUtility.HtmlDecode(auxStr);
// outStr == ó
string outStr2 = System.Net.WebUtility.HtmlDecode("ó");
// outStr2 == ó
}
为了解决这个问题,在WP8,我已经实现了在表HTML ISO-8859-1参考调用之前System.Net.WebUtility.HtmlDecode()
Answer 5:
这为我工作,取代了常见的和Unicode实体。
private static readonly Regex HtmlEntityRegex = new Regex("&(#)?([a-zA-Z0-9]*);");
public static string HtmlDecode(this string html)
{
if (html.IsNullOrEmpty()) return html;
return HtmlEntityRegex.Replace(html, x => x.Groups[1].Value == "#"
? ((char)int.Parse(x.Groups[2].Value)).ToString()
: HttpUtility.HtmlDecode(x.Groups[0].Value));
}
[Test]
[TestCase(null, null)]
[TestCase("", "")]
[TestCase("'fark'", "'fark'")]
[TestCase(""fark"", "\"fark\"")]
public void should_remove_html_entities(string html, string expected)
{
html.HtmlDecode().ShouldEqual(expected);
}
Answer 6:
改进Zumey方法(我不能老是评论那里)。 最大字符尺寸是在实体:&惊叹号; (11)。 在实体上情况也是可能的,当然。 A(源来自维基 )
public string EntityToUnicode(string html) {
var replacements = new Dictionary<string, string>();
var regex = new Regex("(&[a-zA-Z]{2,11};)");
foreach (Match match in regex.Matches(html)) {
if (!replacements.ContainsKey(match.Value)) {
var unicode = HttpUtility.HtmlDecode(match.Value);
if (unicode.Length == 1) {
replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
}
}
}
foreach (var replacement in replacements) {
html = html.Replace(replacement.Key, replacement.Value);
}
return html;
}
文章来源: Converting HTML entities to Unicode Characters in C#