What does HTML Purifier do that secure PHP program

2019-02-01 08:31发布

问题:

I'm researching PHP security best practices and specifically the HTML Purifier library.

I like the idea of using a third-party library to help strengthen the security of my sites, but I'm confused about a few things...

  1. First, a general question... What does HTML Purifier do that practicing secure PHP programming can't?

  2. If I'm using HTML Purifier, does that mean I get to skip common security measures like using PHP functions to filter input and escape output?

  3. One of the response comments for this question seems to suggest that HTML Purifier is only needed for elements that allow HTML tags, such as WYSIWYG editors. Is this correct?

  4. Has anyone noticed a performance lag from using HTML Purifier? This article makes it seem like performance impact is worth considering.

  5. Are there any up-to-date tutorials on integrating HTML Purifier with a non-framework PHP application? Everything I've found is either old or framework-specific.

Just to confirm that I've done my homework before asking this...

  • This question is essentially the same as mine, but the lone response seems to just list another best practice that the asker forgot to mention

  • This 'bountiful' question is a terrific resource about HTML Purifier and HTML5, but assumes foundational knowledge

  • This comparison page on HTML Purifier's site is more of a comparison to other filters

回答1:

There are two extremes when accepting any input from your users:

  1. Indiscriminately escape everything to HTML entities, so the user can inject nothing. This is 100% secure, but allows the user no freedom to add any HTML, for example for bolding text and the like.
  2. Output the content as you received it from the user. This allows the user to <b>bold text</b>, but also to inject scripts or mess with your HTML in any other form the user desires, intentionally or unintentionally.

HTML Purifier allows a middle ground: allow the user to inject some HTML, but not malicious HTML. That's a messy thing to attempt of course, but HTML Purifier is purportedly one of the few libraries, if not the only, that gets it right.

That's the only thing it's supposed to be used for. Don't drop your other security practices. In fact, I'd avoid the whole issue entirely by allowing users to only use a controlled markup language to style their input, such as Markdown (which Stackoverflow uses).