I'm having an issue with using the HTMLPurifier php library. I'm using a WYSIWYG editor named 'Summernote' for all text areas on my application.
When writing something inside sommernote like:
<script>alert('test');</script>
The post data comes through as
<p><script>alert('test');</script></p>
However, once this is ran through the HTMLPurifier, it doesn't remove the script tags that are converted into regular characters. So when I go to edit this text inside summernote, it actually runs the script!
Here's an image of what is processed into the editor:
And here is how it's stored inside the database:
If anyone has any ideas please let me know!
EDIT: Also, if I disable the Summernote WYSIWYG editor, the tags are successfully removed from the textarea when cleaning with HTMLPurifier.
I suspect the underlying issue here is a common mistake:
When you're outputting the purified HTML into your WYSIWYG, you need to use htmlspecialchars()
on it. So, instead of having this in the source code of the rendered page...
<textarea ...>
<p><script>alert('test');</script></p>
</textarea>
...you need to have this:
<textarea ...>
<p>&lt;script&gt;alert('test');&lt;/script&gt;</p>
</textarea>
Then the WYSIWYG should function as expected. (If it doesn't, Edward is actually right - you should look into a different editor.)
The reason that's the correct way to do it is because you want text in your textarea, not HTML. This is easiest to realise if you consider a scenario without HTML Purifier, and someone entering a </textarea>
tag, followed by other tags. Those would break out of the <textarea>
, WYSIWYG or not. So you put a htmlspecialchars()
around what you're outputting, which is just supposed to be text in the textarea. The fact it can deal with HTML tags directly is a coincidence - it's rather misleading, it would probably be better if it didn't work, but most browsers will still show HTML tags as if they had been escaped if you don't do it.
Once the text is properly escaped, then the WYSIWYG can come in, to take the text and interpret it as HTML.
See if htmlspecialchars()
fixes your issue. It should do that, without causing side-effects, even if that might seem counter-intuitive to you now.
(Of course, if you already use htmlspecialchars()
as described, then I'm afraid I don't have an idea off-hand.)