Is there a good way to remove HTML from a Java string? A simple regex like
replaceAll("\\<.*?>","")
will work, but things like &
wont be converted correctly and non-HTML between the two angle brackets will be removed (i.e. the .*?
in the regex will disappear).
Use
Html.fromHtml
HTML Tags are
As per Android’s official Documentations any tags in the HTML will display as a generic replacement String which your program can then go through and replace with real strings.
Html.formHtml
method takes anHtml.TagHandler
and an Html.ImageGetter as arguments as well as the text to parse.Example
Then
Output
This is about me text that the user can put into their profile
This should work -
use this
and this
To get formateed plain html text you can do that:
To get formateed plain text change <br/> by \n and change last line by:
It sounds like you want to go from HTML to plain text.
If that is the case look at www.htmlparser.org. Here is an example that strips all the tags out from the html file found at a URL.
It makes use of org.htmlparser.beans.StringBean.
If you're writing for Android you can do this...
Also very simple using Jericho, and you can retain some of the formatting (line breaks and links, for example).