Why is filter_input() incomplete?

2019-02-08 14:13发布

问题:

I am working a lot on a PHP-based CMS at the moment, and while I'm at it I would like to move all the handling and sanitation of user input to one central place. (At the moment, it's a $_REQUEST here, a $_GET there, and so on).

I like filter_input() very much and would like to use it for basic sanitation, but I'm unclear as to whether this function is really production ready. For example, the documentation names the following parameters for $type

INPUT_GET, INPUT_POST, INPUT_COOKIE, INPUT_SERVER, INPUT_ENV, INPUT_SESSION (not implemented yet) and INPUT_REQUEST (not implemented yet).

the function exists since 5.2.0, why are two crucial elements not implemented yet? If I want to fetch data from $_REQUEST, you have to use a workaround from the user contributed notes. Is there a special reason for this? Is this function still in some kind of beta? Is it trustworthy as the first call to handle incoming data?

Maybe somebody familiar with the PHP development process can shed some light on this.

回答1:

I would like to move all the handling and sanitation of user input to one central place

Yes, how lovely that would be. It can't be done. That's not how text processing works.

If you're inserting text from one context into another you need to use the right escapes. (mysql_real_escape_string for MySQL string literals, htmlspecialchars for HTML content, urlencode for URL parameters, others for specific contexts). At the start of your script when you're filtering, you don't know where your input is going to end up, so you don't know how to escape it.

Maybe one input string is going both into the database (needs to be SQL-escaped) and directly onto the page (needs to be HTML-escaped). There's no one escape that covers both those cases. You can use both escapes one after the other, but then the value in the HTML will have weird backslashes appearing in it and the copy in the database will be full of ampersands. A few rounds of this misencoding and you get that situation where every time you edit something, long strings of \\\\\\\\\\\\\\\\\\\\ and & come out.

The only way you can safely filter in one go at start time is by completely removing all characters that need to be escaped in any of the contexts you're going to be using them in. But that means no apostrophes or backslashes in your HTML, no ampersands or less-thans in your database, and probably a whole load of other URL-unfriendly punctuation has to go too. For a simple site that doesn't take arbitrary text you could maybe get away with that. But usually not.

So you can only escape on the fly when one type of text goes into another. The best strategy to avoid the problem is to avoid concatenating text into other contexts as much as much as you possibly can, for example by using parameterised queries instead of SQL string building, and either defining an echo(htmlspecialchars()) function with a nice short name to make it less work to type, or using an alternative templating system that HTML-escapes by default.



回答2:

"input filtering" or "sanitation" is an absurd idea. Stay away from it.

Explanations and further discussion

What's the best method for sanitizing user input with PHP?

What else should I be doing to sanitize user input?



回答3:

In programming, you must be as restrictive on your input as possible. That goes for data sources as well. $_REQUEST contains everything in $_GET, $_POST and $_COOKIE, which may lead to problems.

Think for example what happens if a plugin of your CMS introduces a new special key in one of them, which happens to exist as a meaningful key in another plugin?

So DON'T ever use $_REQUEST. Use $_GET, $_POST or $_COOKIE, whichever fits your scenario. It's a good practice to be as strict as possible, and that has nothing to do with PHP, but with programming in general.



标签: php security