As a general rule it's best to use whitelist validation since it's easier to accept only characters you know should go there, for example if you have a field where the user inputs his/her phone number you could just do a regex and check that the values received are only numbers, drop everything else and just store the numbers. Note that you should proceed to validate the resulting numbers as well. Blacklist validation is weaker because a skilled attacker could evade your validation functions or send values that your function did not expect, from OWASP "Sanitize with Blacklist":
Eliminate or translate characters (such as to HTML entities or to remove quotes) in an effort to make the input "safe". Like blacklists, this approach requires maintenance and is usually incomplete. As most fields have a particular grammar, it is simpler, faster, and more secure to simply validate a single correct positive test than to try to include complex and slow sanitization routines for all current and future attacks.
Realize that this validation is just a first front defense against attacks. For XSS you should always "Escape" your output so you can print any character's needed but they are escaped meaning that they are changed to their HTML entity and thus the browser knows it's data and not something that the parser should interpret thus effectively shutting down all XSS attacks. For SQL injections escape all data before storing it, try to never use dynamic queries as they are the easiest type of query to exploit. Try to use parameterized store procedures. Also remember to use connections relevant to what the connection has to do. If the connection only needs to read data, create a db account with only "Read" privileges this depends mostly on the roles of the users. For more information please check the links from where this information was extracted from:
Let me explain your question with few more question and answer.
Blacklist VS Whitelist restriction
i. A Blacklist XSS and SQL Injection handling verifies a desired input against a list of negative input's. Basically one would compile a list of all the negative or bad conditions, and verifies that the input received is not one among the bad or negative conditions.
ii. A Whitelist XSS and SQL Injection handling verifies a desired input against a list of possible correct input's. To do this one would compile a list of all the good/positive input values/conditions, and verifies that the input received is one among the correct conditions.
Which one is better to have?
i. An attacker will use any possible means to gain access to your application. This includes trying all sort of negative or bad conditions, various encoding methods, and appending malicious input data to valid data. Do you think you can think of every possible bad permutation that could occur?
ii. A Whitelist is the best way to validate input. You will know exacty what is desired and that there is not any bad types accepted. Typically the best way to create a whitelist is with the use of regular expression's. Using regular expressions is a great way to abstract the whitelisting, instead of manually listing every possible correct value.
Build a good regular expression. Just because you are using a regular expression does not mean bad input will not be accepted. Make sure you test your regular expression and that invalid input cannot be accepted by your regular expression.
For inputs with clearly defined parameters (say the equivalent of a dropdown menu), I would whitelist the options and ignore anything that wasn't one of those.
For free-text inputs, it's significantly more difficult. I subscribe to the school of thought that you should just filter it as best you can so it's as safe as possible (escape HTML, etc). Some other suggestions would be to specifically disallow any invalid input - however, while this might protect against attacks, it might also affect usability for genuine users.
I think it's just a case of finding the blend that works for you. I can't think of any one solution that would work for all possibilities. Mostly it depends on your userbase.
I think whitelisting is the desired approach, however I never met a real whitelist HTML form validation. For example here is a symfony 1.x form with validation from the documentation:
class ContactForm extends sfForm
{
protected static $subjects = array('Subject A', 'Subject B', 'Subject C');
public function configure()
{
$this->setWidgets(array(
'name' => new sfWidgetFormInput(),
'email' => new sfWidgetFormInput(),
'subject' => new sfWidgetFormSelect(array('choices' => self::$subjects)),
'message' => new sfWidgetFormTextarea(),
));
$this->widgetSchema->setNameFormat('contact[%s]');
$this->setValidators(array(
'name' => new sfValidatorString(array('required' => false)),
'email' => new sfValidatorEmail(),
'subject' => new sfValidatorChoice(array('choices' => array_keys(self::$subjects))),
'message' => new sfValidatorString(array('min_length' => 4)),
));
}
}
What you cannot see, that it accepts new inputs without validation settings and it does not check the presence of inputs which are not registered in the form. So this is a blacklist input validation. By whitelist you would define an input validator first, and only after that bind an input field to that validator. By a blacklist approach like this, it is easy to forget to add a validator to an input, and it works perfectly without that, so you would not notice the vulnerability, only when it is too late...
A hypothetical whitelist approach would look like something like this:
class ContactController {
/**
* @input("name", type = "string", singleLine = true, required = false)
* @input("email", type = "email")
* @input("subject", type = "string", alternatives = ['Subject A', 'Subject B', 'Subject C'])
* @input("message", type = "string", range = [4,])
*/
public function post(Inputs $inputs){
//automatically validates inputs
//throws error when an input is not on the list
//throws error when an input has invalid value
}
}
/**
* @controller(ContactController)
* @method(post)
*/
class ContactForm extends sfFormX {
public function configure(InputsMeta $inputs)
{
//automatically binds the form to the input list of the @controller.@method
//throws error when the @controller.@method.@input is not defined for a widget
$this->addWidgets(
new sfWidgetFormInput($inputs->name),
new sfWidgetFormInput($inputs->email),
new sfWidgetFormSelect($inputs->subject),
new sfWidgetFormTextarea($inputs->message)
);
$this->widgetSchema->setNameFormat('contact[%s]');
}
}
The best approach is to either use stored procedures or parameterized queries. White listing is an additional technique that is ok to prevent any injections before they reach the server, but should not be used as your primary defense. Black listing is usually a bad idea because it's usually impossible to filter out all malicious inputs.
BTW, this answer is considering you mean sanitizing as in preventing sql injection.
Personally, I gauge the number of allowed or disallowed characters and go from there. If there are more allowed chars than disallowed, then blacklist. Else whitelist. I don't believe that there is any 'standard' that says you should do it one way or the other.
BTW, this answer is assuming you want to limit inputs into form fields such as phone numbers or names :) @posterBelow
As a general rule it's best to use whitelist validation since it's easier to accept only characters you know should go there, for example if you have a field where the user inputs his/her phone number you could just do a regex and check that the values received are only numbers, drop everything else and just store the numbers. Note that you should proceed to validate the resulting numbers as well. Blacklist validation is weaker because a skilled attacker could evade your validation functions or send values that your function did not expect, from OWASP "Sanitize with Blacklist":
Realize that this validation is just a first front defense against attacks. For XSS you should always "Escape" your output so you can print any character's needed but they are escaped meaning that they are changed to their HTML entity and thus the browser knows it's data and not something that the parser should interpret thus effectively shutting down all XSS attacks. For SQL injections escape all data before storing it, try to never use dynamic queries as they are the easiest type of query to exploit. Try to use parameterized store procedures. Also remember to use connections relevant to what the connection has to do. If the connection only needs to read data, create a db account with only "Read" privileges this depends mostly on the roles of the users. For more information please check the links from where this information was extracted from:
Data Validation OWASP
Guide to SQL Injection OWASP
Let me explain your question with few more question and answer.
Blacklist VS Whitelist restriction
i. A Blacklist XSS and SQL Injection handling verifies a desired input against a list of negative input's. Basically one would compile a list of all the negative or bad conditions, and verifies that the input received is not one among the bad or negative conditions.
ii. A Whitelist XSS and SQL Injection handling verifies a desired input against a list of possible correct input's. To do this one would compile a list of all the good/positive input values/conditions, and verifies that the input received is one among the correct conditions.
Which one is better to have?
i. An attacker will use any possible means to gain access to your application. This includes trying all sort of negative or bad conditions, various encoding methods, and appending malicious input data to valid data. Do you think you can think of every possible bad permutation that could occur?
ii. A Whitelist is the best way to validate input. You will know exacty what is desired and that there is not any bad types accepted. Typically the best way to create a whitelist is with the use of regular expression's. Using regular expressions is a great way to abstract the whitelisting, instead of manually listing every possible correct value.
Build a good regular expression. Just because you are using a regular expression does not mean bad input will not be accepted. Make sure you test your regular expression and that invalid input cannot be accepted by your regular expression.
The answer generally is, it depends.
For inputs with clearly defined parameters (say the equivalent of a dropdown menu), I would whitelist the options and ignore anything that wasn't one of those.
For free-text inputs, it's significantly more difficult. I subscribe to the school of thought that you should just filter it as best you can so it's as safe as possible (escape HTML, etc). Some other suggestions would be to specifically disallow any invalid input - however, while this might protect against attacks, it might also affect usability for genuine users.
I think it's just a case of finding the blend that works for you. I can't think of any one solution that would work for all possibilities. Mostly it depends on your userbase.
I think whitelisting is the desired approach, however I never met a real whitelist HTML form validation. For example here is a symfony 1.x form with validation from the documentation:
What you cannot see, that it accepts new inputs without validation settings and it does not check the presence of inputs which are not registered in the form. So this is a blacklist input validation. By whitelist you would define an input validator first, and only after that bind an input field to that validator. By a blacklist approach like this, it is easy to forget to add a validator to an input, and it works perfectly without that, so you would not notice the vulnerability, only when it is too late...
A hypothetical whitelist approach would look like something like this:
The best approach is to either use stored procedures or parameterized queries. White listing is an additional technique that is ok to prevent any injections before they reach the server, but should not be used as your primary defense. Black listing is usually a bad idea because it's usually impossible to filter out all malicious inputs.
BTW, this answer is considering you mean sanitizing as in preventing sql injection.
Personally, I gauge the number of allowed or disallowed characters and go from there. If there are more allowed chars than disallowed, then blacklist. Else whitelist. I don't believe that there is any 'standard' that says you should do it one way or the other.
BTW, this answer is assuming you want to limit inputs into form fields such as phone numbers or names :) @posterBelow