Remove HTML tags from a String

2018-12-31 01:38发布

Is there a good way to remove HTML from a Java string? A simple regex like

 replaceAll("\\<.*?>","") 

will work, but things like &amp; wont be converted correctly and non-HTML between the two angle brackets will be removed (i.e. the .*? in the regex will disappear).

27条回答
呛了眼睛熬了心
2楼-- · 2018-12-31 02:04

Here is another way to do it:

public static String removeHTML(String input) {
    int i = 0;
    String[] str = input.split("");

    String s = "";
    boolean inTag = false;

    for (i = input.indexOf("<"); i < input.indexOf(">"); i++) {
        inTag = true;
    }
    if (!inTag) {
        for (i = 0; i < str.length; i++) {
            s = s + str[i];
        }
    }
    return s;
}
查看更多
闭嘴吧你
3楼-- · 2018-12-31 02:04

ex: classeString.replaceAll("\<(/?[^\>]+)\>", "\ ").replaceAll("\s+", " ").trim()

查看更多
初与友歌
4楼-- · 2018-12-31 02:05

Here is one more variant of how to replace all(HTML Tags | HTML Entities | Empty Space in HTML content)

content.replaceAll("(<.*?>)|(&.*?;)|([ ]{2,})", ""); where content is a String.

查看更多
登录 后发表回答