Ok, so I am aware of methods of input sanitization/checking such as using a whitelist, blacklist, mysqli_escape (or whatever it is in PHP), but say you have a site where, for some reason, instead of having users upload code (HTML, Javascript, C, etc.) inside of a file, they submit snippets directly. However, like any good web admin, you don't want to allow SQL or other types of code injection.
As I understand it, using something like mysql_escape will escape all special characters, keywords, etc., so we're covered there. However, what about HTML, Javascript, PHP, and other code?
My thinking is that if text, even if it is code, is stored in a database, that doesn't mean it is just going to execute itself. For HTML, Javascript or PHP, it would only be executed once the column value is selected and used for generating a webpage. So, would it be enough to do html_escape when serving the page to prevent code execution? I suppose you could html_escape before putting the data into the database, but what if the code snippets were retrieved using something other than a web browser? For example, what if the site's main purpose was a code versioning system and the code needs to remain intact?
You will find a huge amount of hurt if you try to put the escaped text into the database. Instead, store exactly what they typed in, and be sure to wrap it properly when presenting it. See Joel on Software for a reasonable method of approaching this from a coding standpint.
Also, avoid manually calling mysql_escape
. You should be using prepare
to create a SQL statement with placeholders, such as "INSERT INTO code (texty_part) VALUES (?)"
, and then running $sth->execute($texty_part)
to do the insertion.
mysql_real_escape_string
escapes characters that are special regarding MySQL string literals:
\
, backspace character
'
, single quote character
"
, double quote character
\0
, NUL character
\n
, line break character
\r
, carriage return character
^Z
, Control-Z, substitute character
However, it handles any data equally regardless of what the input data is actually representing. mysql_real_escape_string
does only consider the character encoding in effect.
Regarding your concern: HTML, JavaScript and PHP are all interpreted languages and require an interpreter for interpretation/execution. As for HTML and JavaScript, it’s typically a web browser. And as for PHP, it’s typically the PHP runtime on a web server. But MySQL server can only interpret one language: the SQL dialect also known as MySQL.
This is also the reason why one wouldn’t handle other languages besides MySQL when inserting data into the database. Neither would one escape/encode/sanitize/insert-preferred-verb-here HTML, nor JavaScript, nor PHP, nor any other language besides MySQL.
You would only escape/encode/sanitize/insert-preferred-verb-here any of those languages when you hand the data down to a component that may interpret that specific language.
As I understand it, using something like mysql_escape will escape all special characters, keywords, etc.
You understand it wrong. "mysql_escape" doesn't make "all special characters, keywords, etc." "harmless", as you (after wast masses of PHP users) believe. The purpose of this honest function is totally different.
For HTML, Javascript or PHP, it would only be executed once the column value is selected and used for generating a webpage.
Save for PHP which won't be executed unless you call eval() on it explicitly
I suppose you could html_escape before putting the data into the database, but what if the code snippets were retrieved using something other than a web browser?
html_escape it before sending to browser then?
Do both
1. Input Sanitize.
2. Out put encoding before displaying