I would like to use the single SQL-statement
insert into T (...) select ... from T where ...
to copy a lot of datasets. My problem is that there are N:M-relationships from table T
to other tables and these have to be copied too. How can I do this, if I do not know which original dataset belongs to which copied dataset? Let me demonstrate by example.
Content of the database before:
Table T
:
ID | COL1 | COL2
-----------------
1 | A | B
2 | C | D
N:M-table references table U from table T (table U is not shown):
T | U
---------
1 | 100
1 | 101
2 | 100
2 | 102
My copy operation where [???] is the part I do not know:
insert into T (COL1, COL2) select COL1, COL2 from T
insert into NM (T, U) select [???]
Content of the database after:
Table T
:
ID | COL1 | COL2
-----------------
1 | A | B
2 | C | D
3 | A | B
4 | C | D
N:M-table:
T | U
---------
1 | 100
1 | 101
2 | 100
2 | 102
3 | 100
3 | 101
4 | 100
4 | 102
Notice:
- I have thousands of datasets (not just two)
- I want to use 'insert ... select' to get a better performance
If you are lucky enough to run the current PostgreSQL 9.1, there is an elegant and fast solution with a single command using the new data-modifying CTEs.
No such luck with MySQL which does not support Common Table Expressions (CTE), not to mention data-modifying CTE.
Assuming (col1, col2)
is initially unique:
Query 1
- You can easily pick arbitrary slices from the table in this case.
- No sequence numbers for
t.id
will be wasted.
WITH s AS (
SELECT id, col1, col2
FROM t
-- WHERE some condition
)
,i AS (
INSERT INTO t (col1, col2)
SELECT col1, col2 -- I gather from comments that id is a serial column
FROM s
RETURNING id, col1, col2
)
INSERT INTO tu (t, u)
SELECT i.id, tu.u
FROM tu
JOIN s ON tu.t = s.id
JOIN i USING (col1, col2);
If (col1, col2)
is not unique, I see two other ways:
Query 2
- Use the window function
row_number()
to make non-unique rows unique.
INSERT
rows without holes in the t.id
space just like in the query above.
WITH s AS (
SELECT id, col1, col2
, row_number() OVER (PARTITION BY col1, col2) AS rn
FROM t
-- WHERE some condition
)
,i AS (
INSERT INTO t (col1, col2)
SELECT col1, col2
FROM s
RETURNING id, col1, col2
)
,r AS (
SELECT *
, row_number() OVER (PARTITION BY col1, col2) AS rn
FROM i
)
INSERT INTO tu (t, u)
SELECT r.id, tu.u
FROM r
JOIN s USING (col1, col2, rn) -- match exactly one id per row
JOIN tu ON tu.t = s.id;
Query 3
- This is based on the same idea that @ypercube already supplied, but all in one query.
- If there are holes in numbers space for current
t.id
, sequence numbers will be burnt for the new rows accordingly.
- Don't forget to reset your sequence beyond the new maximum or you will get duplicate key errors for new inserts in
t
that draw the default for id
from the sequence. I integrated this as final step into the command. Fastest & safest this way.
WITH s AS (
SELECT max(id) AS max_id
FROM t
)
,i AS (
INSERT INTO t (id, col1, col2)
SELECT id + s.max_id, col1, col2
FROM t, s
)
,j AS (
INSERT INTO tu (t, u)
SELECT tu.t + s.max_id, tu.u
FROM tu, s
)
SELECT setval('t_id_seq', s.max_id + s.max_id)
FROM s;
Details about setval() in the manual.
Test setup
For a quick test.
CREATE TEMP TABLE t (id serial primary key, col1 text, col2 text);
INSERT INTO t (col1, col2) VALUES
('A', 'B')
,('C', 'D');
CREATE TEMP TABLE tu (t int, u int);
INSERT INTO tu VALUES
(1, 100)
,(1, 101)
,(2, 100)
,(2, 102);
SELECT * FROM t;
SELECT * FROM tu;
There was a somewhat similar question recently, where I provided a somewhat similar answer. Plus alternatives for version 8.3 without CTEs and window functions.
Step 1. Lock (both) tables or make sure that only this script is running. Disable FK checks.
Step 2. Use these two INSERT
statements, in this order:
INSERT INTO NM
(T, U)
SELECT
T + maxID, U
FROM
NM
CROSS JOIN
( SELECT MAX(ID) AS maxID
FROM T
) AS m
INSERT INTO T
(ID, COL1, COL2)
SELECT
ID+maxID, COL1, COL2
FROM
T
CROSS JOIN
( SELECT MAX(ID) AS maxID
FROM T
) AS m
Step 3. Re-enable FKs.