Is there a quick way to recognize HTML ASCII codes

2019-02-18 06:35发布

问题:

Here are some HTML ASCII Codes:

http://www.ascii.cl/htmlcodes.htm

I have a string that may look like "All in a hard day 's work"

What is the best way to replace that ascii code, with an apostrophe?

回答1:

Use Html.fromHtml(String) to decode the string.



回答2:

Use Apache's StringEscapeUtils.escapeHtml(String) or StringEscapeUtils.unescapeHtml(String). This is found in the commons libraries.

If you need to preserve any HTML Markup, but just remove any ascii encoding, then you will have to construct a Map of the values you want to escape. It's an exercise in String manipulation, so it may be considered an 'ugly hack', but it will run quickly.

For example with some pseudo code, Create a Map<String, String>(), and populate it with the the value you want to replace as the Key, and the value to replace it with in the Value. Find the HTML ascii code in the document using a regular expression, look the ascii code up in your Map of replacements Replace the occurrence of the HTML ascii code with the text equivalent.

I will post some code over the weekend if I get a chance.