I want to pass a table name as a parameter in a Postgres function. I tried this code:
CREATE OR REPLACE FUNCTION some_f(param character varying) RETURNS integer
AS $$
BEGIN
IF EXISTS (select * from quote_ident($1) where quote_ident($1).id=1) THEN
return 1;
END IF;
return 0;
END;
$$ LANGUAGE plpgsql;
select some_f('table_name');
And I got this:
ERROR: syntax error at or near "."
LINE 4: ...elect * from quote_ident($1) where quote_ident($1).id=1)...
^
********** Error **********
ERROR: syntax error at or near "."
And here is the error I got when changed to this select * from quote_ident($1) tab where tab.id=1
:
ERROR: column tab.id does not exist
LINE 1: ...T EXISTS (select * from quote_ident($1) tab where tab.id...
Probably, quote_ident($1)
works, because without the where quote_ident($1).id=1
part I get 1
, which means something is selected. Why may the first quote_ident($1)
work and the second one not at the same time? And how could this be solved?
The first doesn't actually "work" in the sense that you mean, it works only in so far as it does not generate an error.
Try
SELECT * FROM quote_ident('table_that_does_not_exist');
, and you will see why your function returns 1: the select is returning a table with one column (namedquote_ident
) with one row (the variable$1
or in this particular casetable_that_does_not_exist
).What you want to do will require dynamic SQL, which is actually the place that the
quote_*
functions are meant to be used.If you want table name, column name and value to be dynamically passed to function as parameter
use this code
Don't do this.
That's the answer. It's a terrible anti-pattern. What purpose does it serve? If the client knows the table it wants data from, then
SELECT FROM ThatTable
! If you've designed your database in a way that this is required, you've probably designed it wrong. If your data access layer needs to know if a value exists in a table, it is trivially easy to do the dynamic SQL part in that code. Pushing it into the database is not good.I have an idea: let's install a device inside elevators where you can type in the number of the floor you want. Then when you press "Go", it moves a mechanical hand over to the correct button for the desired floor and presses it for you. Revolutionary!
Apparently my answer was too short on explanation so I am repairing this defect with more detail.
I had no intention of mockery. My silly elevator example was the very best device I could imagine for succinctly pointing out the flaws of the technique suggested in the question. That technique adds a completely useless layer of indirection, and needlessly moves table name choice from a caller space using a robust and well-understood DSL (SQL) into a hybrid using obscure/bizarre server-side SQL code.
Such responsibility splitting through movement of query construction logic into dynamic SQL makes the code harder to understand. It destroys a perfectly reasonable convention (how a SQL query chooses what to select) in the name of custom code fraught with potential for error.
Dynamic SQL offers the possibility of SQL injection that is hard to recognize in the front end code or the back end code singly (one must inspect them together to see this).
Stored procedures and functions can access resources that the SP/function owner has rights to but the caller doesn't. As far as I understand, when you use code that produces dynamic SQL and runs it, the database executes the dynamic SQL under the rights of the caller. This means you either won't be able to use privileged objects at all, or you have to open them up to all clients, increasing the surface area of potential attack to privileged data. Setting the SP/function at creation time to always run as a particular user (in SQL Server,
EXECUTE AS
) may solve that problem, but makes things more complicated. This exacerbates the risk of SQL injection mentioned in the previous point, by making the dynamic SQL a very enticing attack vector.When a developer must understand what the application code is doing in order to modify it or fix a bug, he'll find it very difficult to get the exact SQL query being executed. SQL profiler can be used, but this takes special privileges and can have negative performance effects on production systems. The executed query can be logged by the SP but this increases complexity for no reason (maintaining new tables, purging old data, etc.) and is totally non-obvious. In fact, some applications are architected such that the developer does not have database credentials, so it becomes almost impossible for him to actually see the query being submitted.
When an error occurs, such as when you try to select a table that doesn't exist, you'll get a message along the lines of "invalid object name" from the database. That will happen exactly the same whether you're composing the SQL in the back end or the database, but the difference is, some poor developer who's trying to troubleshoot the system has to spelunk one level deeper into yet another cave below the one where the problem actually exists, to dig into the wonder-procedure that Does It All and try to figure out what the problem is. Logs won't show "Error in GetWidget", it will show "Error in OneProcedureToRuleThemAllRunner". This abstraction will just make your system worse.
Here's a far better example in pseudo-C# of switching table names based on a parameter:
Every flaw I have mentioned with the other technique is completely absent from this example.
There's just no purpose, no benefit, no possible improvement in submitting a table name to a stored procedure.
This can be further simplified and improved:
Call with schema-qualified name (see below):
Or:
Major points
Use an
OUT
parameter to simplify the function. You can directly select the result of the dynamic SQL into it and be done. No need for additional variables and code.EXISTS
does exactly what you want. You gettrue
if the row exists orfalse
otherwise. There are various ways to do this,EXISTS
is typically most efficient.You seem to want an integer back, so I cast the
boolean
result fromEXISTS
tointeger
, which yields exactly what you had. I would return boolean instead.I use the object identifier type
regclass
as input type for_tbl
. That does everythingquote_ident(_tbl)
orformat('%I', _tbl)
would do, but better, because:.. it prevents SQL injection just as well.
.. it fails immediately and more gracefully if the table name is invalid / does not exist / is invisible to the current user. (A
regclass
parameter is only applicable for existing tables.).. it works with schema-qualified table names, where a plain
quote_ident(_tbl)
orformat(%I)
would fail because they cannot resolve the ambiguity. You would have to pass and escape schema and table names separately.I still use
format()
, because it simplifies the syntax (and to demonstrate how it's used), but with%s
instead of%I
. Typically, queries are more complex soformat()
helps more. For the simple example we could as well just concatenate:No need to table-qualify the
id
column while there is only a single table in theFROM
list. No ambiguity possible in this example. (Dynamic) SQL commands insideEXECUTE
have a separate scope, function variables or parameters are not visible there - as opposed to plain SQL commands in the function body.Tested with PostgreSQL 9.1.
format()
requires at least that version.Here's why you always escape user input for dynamic SQL properly:
SQL Fiddle demonstrating SQL injection
I have 9.4 version of PostgreSQL and I always use this code:
And then:
It works good for me.
Attention! Above example is one of those which shows "How do not if we want to keep safety during querying the database" :P
Inside plpgsql code, The EXECUTE statement must be used for queries in which table names or columns come from variables. Also the
IF EXISTS (<query>)
construct is not allowed whenquery
is dynamically generated.Here's your function with both problems fixed: