The standard approach for using variable values in SQLite queries is the "question mark style", like this:
import sqlite3
with sqlite3.connect(":memory:") as connection:
connection.execute("CREATE TABLE foo(bar)")
connection.execute("INSERT INTO foo(bar) VALUES (?)", ("cow",))
print(list(connection.execute("SELECT * from foo")))
# prints [(u'cow',)]
However, this only works for substituting values into queries. It fails when used for table or column names:
import sqlite3
with sqlite3.connect(":memory:") as connection:
connection.execute("CREATE TABLE foo(?)", ("bar",))
# raises sqlite3.OperationalError: near "?": syntax error
Neither the sqlite3
module nor PEP 249 mention a function for escaping names or values. Presumably this is to discourage users from assembling their queries with strings, but it leaves me at a loss.
What function or technique is most appropriate for using variable names for columns or tables in SQLite? I'd would strongly prefer to do able to do this without any other dependencies, since I'll be using it in my own wrapper.
I looked for but couldn't find a clear and complete description of the relevant part of SQLite's syntax, to use to write my own function. I want to be sure this will work for any identifier permitted by SQLite, so a trial-and-error solution is too uncertain for me.
SQLite uses "
to quote identifiers but I'm not sure that just escaping them is sufficient. PHP's sqlite_escape_string
function's documentation suggests that certain binary data may need to be escaped as well, but that may be a quirk of the PHP library.
If you're quite certain that you need to specify column names dynamically, you should use a library that can do so safely (and complains about things that are wrong). SQLAlchemy is very good at that.
foo_table
now represents the table with the dynamic schema, but you can only use it in the context of an actual database connection (so that sqlalchemy knows the dialect, and what to do with the generated sql).You can then issue the
CREATE TABLE ...
. withecho=True
, sqlalchemy will log the generated sql, but in general, sqlalchemy goes out of its way to keep the generated sql out of your hands (lest you consider using it for evil purposes).and yes, sqlalchemy will take care of any column names that need special handling, like when the column name is a sql reserved word
and can save you from possible badness:
(apparently, some strange things are perfectly legal identifiers in sqlite)
If you find that you need a variable entity name (either relvar or field) then you probably are doing something wrong. an alternative pattern would be to use a property map, something like:
Then, you just specify the name dynamically when doing an insert instead of a column.
The first thing to understand is that table/column names cannot be escaped in the same sense than you can escape strings stored as database values.
The reason is that you either have to:
Having understood that, the second thing to understand is that how you will end up "escaping" table/column names depends on your specific context, and so there is more than one way to do this, but whatever the way, you'll need to dig up to figure out exactly what is or is not an acceptable column/table name in sqlite.
To get you started, here is one condition:
Even better, using certain column names can have unintended side effects:
Both quoted texts are from http://www.sqlite.org/lang_createtable.html
To convert any string into a SQLite identifier:
"
with""
.Implementation
Given a string single argument, it will escape and quote it correctly or raise an exception. The second argument can be used to specify any error handler registered in the
codecs
module. The built-in ones are:This doesn't check for reserved identifiers, so if you try to create a new
SQLITE_MASTER
table it won't stop you.Example Usage
Observations and References
TEXT
, not binary.SQLITE_MASTER
schema in the FAQstr
s, notbytes
.sqlite3
can handle any other unicode string as long as it can be properly encoded to UTF-8. Invalid strings could cause crashes between Python 3.0 and Python 3.1.2 or thereabouts. Python 2 accepted these invalid strings, but this is considered a bug.The
psycopg2
documentation explicitly recommends using normal python % or {} formatting to substitute in table and column names (or other bits of dynamic syntax), and then using the parameter mechanism to substitute values into the query.I disagree with everyone who is saying "don't ever use dynamic table/column names, you're doing something wrong if you need to". I write programs to automate stuff with databases every day, and I do it all the time. We have lots of databases with lots of tables, but they are all built on repeated patterns, so generic code to handle them is extremely useful. Hand-writing the queries every time would be far more error prone and dangerous.
It comes down to what "safe" means. The conventional wisdom is that using normal python string manipulation to put values into your queries is not "safe". This is because there are all sorts of things that can go wrong if you do that, and such data very often comes from the user and is not in your control. You need a 100% reliable way of escaping these values properly so that a user cannot inject SQL in a data value and have the database execute it. So the library writers do this job; you never should.
If, however, you're writing generic helper code to operate on things in databases, then these considerations don't apply as much. You are implicitly giving anyone who can call such code access to everything in the database; that's the point of the helper code. So now the safety concern is making sure that user-generated data can never be used in such code. This is a general security issue in coding, and is just the same problem as blindly
exec
ing a user-input string. It's a distinct issue from inserting values into your queries, because there you want to be able to safely handle user-input data.So my recommendation is: do whatever you want to dynamically assemble your queries. Use normal python string templating to sub in table and column names, glue on where clauses and joins, all the good (and horrible to debug) stuff. But make sure you're aware that whatever values such code touches has to come from you, not your users[1]. Then you use SQLite's parameter substitution functionality to safely insert user-input values into your queries as values.
[1] If (as is the case for a lot of the code I write) your users are the people who have full access to databases anyway and the code is to simplify their work, then this consideration doesn't really apply; you probably are assembling queries on user-specified tables. But you should still use SQLite's parameter substitution to save yourself from the inevitable genuine value that eventually contains quotes or percent signs.
As of
psycopg2
version 2.7 (released Feb 2017), column names and table names (identifiers) can be generated on the fly in a safe way usingpsycopg2.sql
. Here is a link to the documentation with examples: http://initd.org/psycopg/docs/sql.html.So the way to write the query in your question would be: