Now i ran into some stupid situation. I want the users to be able to use textile, but they shouldn't mess around with my valid HTML around their entry. So I have to escape the HTML somehow.
html_escape(textilize("</body>Foo"))
would break textile while
textilize(html_escape("</body>Foo"))
would work, but breaks various Textile features like links (written like "Linkname":http://www.wheretogo.com/
), since the quotes would be transformed into "
and thus not detected by textile anymore.
sanitize
doesn't do a better job.
Any suggestions on that one? I would prefer not to use Tidy for this problem.
Thanks in advance.
For those who run into the same problem: If you are using the RedCloth gem you can just define your own method (in one of your helpers).
def safe_textilize( s )
if s && s.respond_to?(:to_s)
doc = RedCloth.new( s.to_s )
doc.filter_html = true
doc.to_html
end
end
Excerpt from the Documentation:
Accessors for setting security restrictions.
This is a nice thing if you‘re using RedCloth for formatting in
public places (e.g. Wikis) where you don‘t want users to abuse HTML for bad things.
If filter_html
is set, HTML which wasn‘t created by the Textile processor will be
escaped. Alternatively, if sanitize_html
is set, HTML can pass through the Textile
processor but unauthorized tags and attributes will be removed.
This works for me and guards against every XSS attack I've tried including onmouse... handlers in pre and code blocks:
<%= RedCloth.new( sanitize( @comment.body ), [:filter_html, :filter_styles, :filter_classes, :filter_ids] ).to_html -%>
The initial sanitize removes a lot of potential XSS exploits including mouseovers.
As far as I can tell :filter_html escapes most html tags apart from code and pre. The other filters are there because I don't want users applying any classes, ids and styles.
I just tested my comments page with your example
"</body>Foo"
and it completely removed the rogue body tag
I am using Redcloth version 4.2.3 and Rails version 2.3.5
Looks like textile simply doesn't support what you want.
You really want to only allow a carefully controlled subset of HTML, but textile is designed to allow arbitrary HTML. I don't think you can use textile at all in this situation (unless it supports that kind of restriction).
What you need is probably a special "restricted" version of textile, that only allows "safe" markup (defining that however might already be tricky). I do not know if that exists, however.
You might have a look at BBCode, that allows to restrict the possible markup.