I am evaluating jsoup for the functionality which would sanitize (but not remove!) the non-whitelisted tags. Let's say only <b>
tag is allowed, so the following input
foo <b>bar</b> <script onLoad='stealYourCookies();'>baz</script>
has to yield the following:
foo <b>bar</b> <script onLoad='stealYourCookies();'>baz</script>
I see the following problems/questions with jsoup:
document.getAllElements()
always assumes<html>
,<head>
and<body>
. Yes, I can calldocument.body().getAllElements()
but the point is that I don't know if my source is a full HTML document or just the body -- and I want the result in the same shape and form as it came in;- how do I replace
<script>...</script>
with<script>...</script>
? I only want to replace brackets with escaped entities and do not want to alter any attributes, etc.Node.replaceWith
sounds like an overkill for this. - Is it possible to completely switch off pretty printing (e.g. insertion of new lines, etc.)?
Or maybe I should use another framework? I have peeked at htmlcleaner so far, but the given examples don't suggest my desired functionality is supported.
Answer 1
How do you load / parse your
Document
with Jsoup? If you useparse()
orconnect().get()
jsoup will automaticly format your html (insertinghtml
,body
andhead
tags). This this ensures you always have a complete Html document - even if input isnt complete.Let's assume you only want to clean an input (no furhter processing) you should use
clean()
instead the previous listed methods.Example 1 - Using parse()
Output:
Input html is completed to ensure you have a complete document.
Example 2 - Using clean()
Output:
Input html is cleaned, not more.
Documentation:
Answer 2
The method
replaceWith()
does exactly what you need:Example:
Output:
Or body only:
Output:
Documentation:
Answer 3
Yes,
prettyPrint()
method ofJsoup.OutputSettings
does this.Example:
Note: if the
outputSettings()
method is not available, please update Jsoup.Output:
Documentation:
Answer 4 (no bullet)
No! Jsoup is one of the best and most capable Html library out there!