How can I allow my user to insert HTML code, witho

2019-01-17 17:27发布

问题:

I developed a web application, that permits my users to manage some aspects of a web site dynamically (yes, some kind of cms) in LAMP environment (debian, apache, php, mysql)

Well, for example, they create a news in their private area on my server, then this is published on their website via a cURL request (or by ajax).

The news is created with an WYSIWYG editor (fck at moment, probably tinyMCE in the next future).

So, i can't disallow the html tags, but how can i be safe? What kind of tags i MUST delete (javascripts?)? That in meaning to be server-safe.. but how to be 'legally' safe? If an user use my application to make xss, can i be have some legal troubles?

回答1:

If you are using php, an excellent solution is to use HTMLPurifier. It has many options to filter out bad stuff, and as a side effect, guarantees well formed html output. I use it to view spam which can be a hostile environment.



回答2:

It doesn't really matter what you're looking to remove, someone will always find a way to get around it. As a reference take a look at this XSS Cheat Sheet.

As an example, how are you ever going to remove this valid XSS attack:

<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>

Your best option is only allow a subset of acceptable tags and remove anything else. This practice is know as White Listing and is the best method for preventing XSS (besides disallowing HTML.)

Also use the cheat sheet in your testing; fire as much as you can at your website and try to find some ways to perform XSS.



回答3:

The general best strategy here is to whitelist specific tags and attributes that you deem safe, and escape/remove everything else. For example, a sensible whitelist might be <p>, <ul>, <ol>, <li>, <strong>, <em>, <pre>, <code>, <blockquote>, <cite>. Alternatively, consider human-friendly markup like Textile or Markdown that can be easily converted into safe HTML.



回答4:

Rather than allow HTML, you should have some other markup that can be converted to HTML. Trying to strip out rogue HTML from user input is nearly impossible, for example

<scr<script>ipt etc="...">

Removing from this will leave

<script etc="...">


回答5:

For a C# example of white list approach, which stackoverflow uses, you can look at this page.



回答6:

Kohana's security helper is pretty good. From what I remember, it was taken from a different project.

However I tested out

<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>

From LFSR Consulting's answer, and it escaped it correctly.



回答7:

If it is too difficult removing the tags you could reject the whole html-data until the user enters a valid one. I would reject html if it contains the following tags:

frameset,frame,iframe,script,object,embed,applet.

Also tags which you want to disallow are: head (and sub-tags),body,html because you want to provide them by yourself and you do not want the user to manipulate your metadata.

But generally speaking, allowing the user to provide his own html code always imposes some security issues.



回答8:

You might want to consider, rather than allowing HTML at all, implementing some standin for HTML like BBCode or Markdown.



回答9:

I use this php strip_tags function because i want user can post safely and i allow just few tags which can be used in post in this way nobody can hack your website through script injection so i think strip_tags is best option

Clich here for code for this php function



回答10:

It is very good function in php you can use it

$string = strip_tags($_POST['comment'], "<b>");