Is there a way of getting jsoup to clean a string with HTML in it by escaping the unwanted HTML rather than removing it completely? My example:
String dirty = "This is <b>REALLY</b> dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>
String clean = Jsoup.clean(dirty, new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target"));
This gives a "clean" string of:
This is REALLY dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>
What I am wanting is the "clean" string to be:
"This is <b>REALLY</b> dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>
Assuming String rather than HTML documents are being parsed (as per your question) this method will work:
You could make the "b" tag an argument to pass in a list of tags you wish to escape.
The associated passing JUnit test:
Note that I added a line return "\n" before your "a" tag in my test's "expected" String because JSoup formats the page.