Are there any good PHP based HTML filters availabl

2019-02-04 16:24发布

问题:

I am currently in a project with a PHP frontend. We're pretty concerned about security, because we'll have quite a lot of users and are an attractive target for hackers. Our users are able to submit HTML formatted content that is visible to other users later. This is a big problem because we're vulnerable for the whole set of XSS attacks. We're filtering as good as we can, but the variety of attack vectors is pretty big.

So, I'm searching for PHP based HTML sanitizing/filtering solutions. Commercial solutions are fine (even preferred). Currently we're using a modified HTML purifier, but we're not satisfied with the results.

What are some good libraries/tools that are capable of filtering malicious parts of HTML?

It is nice to have for example HTML5 awareness, which will become a security nightmare once it's available "in the wild".

Update: We're doing an in-depth configuration of HTML Purifier. It looks like the older framework we used before was just not configuring it at all. Now the results look much better.

回答1:

HTML Purifier project

Personally I have had very good results with the HTML Purifier project

It is highly customizable and has a huge code base. The only issue is uploading the files to your server.

Are you sure you have not got a configuration issue with your installation? As the purifier should not let through any HTML tags at all if configured correctly.

From the web site:

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited,
secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.
Tired of using BBCode due to the current landscape of deficient or
insecure HTML filters? Have a
WYSIWYG editor but never been able to use it? Looking for high-quality, standards-compliant, open-source components for that application you're building? HTML Purifier is for you!

I wrote an article about how to use the HTML purifier library with CodeIgniter here.

Maybe it will help with giving it another try:

// load the config and overide defaults as necessary
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');
$config->set('HTML', 'AllowedElements', 'a,em,blockquote,p,strong,pre,code');
$config->set('HTML', 'AllowedAttributes', 'a.href,a.title');
$config->set('HTML', 'TidyLevel', 'light'); 


回答2:

CodeIgniter has an excellent XSS filter, you could rip it out of the system/libraries/Input.php file if you wanted it as a standalone function.



回答3:

kses works well. You can easily specify which elements to allow and disallow, so making it ‘HTML5-aware’ would just be a matter of setting an array.

WordPress uses it, so I guess it’s pretty safe ;)



回答4:

I can really recommend kses for HTML filtering. Actually that's what wordpress uses. Its free and open source.



回答5:

I've used this class before and had pretty decent success: http://www.phpclasses.org/browse/package/2189.html



回答6:

You can use your current solution and add iframes with different base URLs to show the contents. Changing the base URL on the iframe will disable access from the internal JavaScript code to the main page. That is, if your URL is http://www.yoururl.com/thread/500 you can use it in the iframe to show content, something like: http//yoururl.com/thread/500/coment/1, http//yoururl.com/thread/500/coment/2.

The base URL you can set can be dependent on your DNS/host configuration.

It's not a solution to fix the problem but to jump it over, although it can be useful until you find something else.



回答7:

HTMLPurifier probably works—but let me just say that the folder structure is over-complicated and pompous. Hundreds of lines of comments, a folder called "test", a license file, read-mes and info files, images, ANOTHER folder for smoketesting (which is downright abusive), extras, configs, benchmarks—and to top it all off, about 10 different CMS compatibility modes, testimonials on their website, full versions, lite versions, husky, mildly-chubby, down-syndrome and the full spectrum of politically correct programatical variations.