This question is an exact duplicate of:
- Improving a function that UPSERTs based on an input array 1 answer
I'm working on paying back some technical debt this week, and it hit me that I have no idea how to make multi-value inserts safe from accidental or malicious SQL injections. We're on Postgres 11.4. I've got a test bed to work from that includes a small table with about 26K rows, here's the declaration for a small table I'm using for testing:
BEGIN;
DROP TABLE IF EXISTS "data"."item" CASCADE;
CREATE TABLE IF NOT EXISTS "data"."item" (
"id" uuid NOT NULL DEFAULT NULL,
"marked_for_deletion" boolean NOT NULL DEFAULT false,
"name_" citext NOT NULL DEFAULT NULL,
CONSTRAINT item_id_pkey
PRIMARY KEY ("id")
);
CREATE INDEX item_marked_for_deletion_ix_bgin ON "data"."item" USING GIN("marked_for_deletion") WHERE marked_for_deletion = true;
ALTER TABLE "data"."item" OWNER TO "user_change_structure";
COMMIT;
I've been inserting to this table, and many others, using multi-value inserts, along the lines of:
BEGIN;
INSERT
bundle up hundres or thousands of rows
ON CONFLICT do what I need
COMMIT or ROLLBACK on the client side
Works fine. But how do you make a multi-value statement safe? That's what I can't figure out. This is one of those areas where I can't reason about the problem well. I don't have the appetite, aptitude, or patience for hacking things. That I can't think up an exploit means nothing, I would suck as a hacker. And, for that matter, I'm generally more concerned about errors than evil in code, since I run into errors a whole lot more often.
The standard advice I see for safe insertion is to use a prepared statement. A prepared statement for an INSERT is pretty much a temporary, runtime function for interpolation on a code template. For me, it's simpler to write an actual function, like this one:
DROP FUNCTION IF EXISTS data.item_insert_s (uuid, boolean, citext);
CREATE OR REPLACE FUNCTION data.item_insert_s (uuid, boolean, citext)
RETURNS int
AS $$
INSERT INTO item (
id,
marked_for_deletion,
name_)
VALUES
($1,$2,$3)
ON CONFLICT(id) DO UPDATE SET
marked_for_deletion = EXCLUDED.marked_for_deletion,
name_ = EXCLUDED.name_;
SELECT 1; -- No clue what to return, but you have to return something.
$$ LANGUAGE sql;
ALTER FUNCTION data.item_insert_s(uuid, boolean, citext) OWNER TO user_bender;
All of that works, and I've tried some timing tests. I truncate the table, do a multi-value insert, truncate, do a series of function call inserts, and see what the difference is. I've tried multiple runs, doing the operations in different orders, etc. Both cases use a BEGIN/COMMIT block in the same way, so I'll end up with the same number of transactions on either test. The results vary more across tests than within them, but the multi-value insert is always faster. Congratulations to me for confirming the obvious.
Is there a way to safely do bulk inserts and updates? It occurred to me that I could write a function that takes an array or arrays, parse it out, and run the code in a loop within the function. I'd like to test that out, but get flummoxed by the Postgres array syntax. I've looked around, and it sounds like an array of objects and a foreach loop might be just what I'm after. I've looked around, and this is a topic that has been addressed, but I haven't found a straightforward example of how to prepare data for insertion, and the the unpacking of it. I'm suspecting that I won't be able to use SQL and a plain unnest() because 1) I want to safe the inputs and 2) I might have functions that don't take all of the fields in a table in their input.
To make things a bit easier, I'm fine with functions with fixed parameter lists, and array inputs with fixed formats. I'll write code generators for my various tables, so I don't need to make the Postgres-side code any more complex than necessary.
Thanks for any help!
Note: I got a message to explain why this question is different than my newer, related question:
Improving a function that UPSERTs based on an input array
Answer: Yes, it's the same starting point. In this question, I was asking about SQL injection, in the second question I was trying to focus on the array-input solution. I'm not quite sure when to split out new questions, and when to let questions turn into multi-part threads.