Copy datasets with n:m-relation

I would like to use the single SQL-statement

insert into T (...) select ... from T where ...

to copy a lot of datasets. My problem is that there are N:M-relationships from table T to other tables and these have to be copied too. How can I do this, if I do not know which original dataset belongs to which copied dataset? Let me demonstrate by example.

Content of the database before:

Table T:

ID  | COL1 | COL2    
-----------------
1   | A    | B
2   | C    | D

N:M-table references table U from table T (table U is not shown):

T   | U              
---------
1   | 100
1   | 101
2   | 100
2   | 102

My copy operation where [???] is the part I do not know:

insert into T (COL1, COL2) select COL1, COL2 from T
insert into NM (T, U) select [???]

Content of the database after:

Table T:

ID  | COL1 | COL2
-----------------
1   | A    | B
2   | C    | D
3   | A    | B
4   | C    | D

N:M-table:

T   | U
---------
1   | 100
1   | 101
2   | 100
2   | 102
3   | 100
3   | 101
4   | 100
4   | 102

Notice:

I have thousands of datasets (not just two)
I want to use 'insert ... select' to get a better performance

标签： mysql sql postgresql insert many-to-many

2条回答

女痞

2楼-- · 2019-05-16 21:45

Step 1. Lock (both) tables or make sure that only this script is running. Disable FK checks.

Step 2. Use these two INSERT statements, in this order:

INSERT INTO NM 
    (T, U) 
  SELECT 
      T + maxID, U
  FROM 
      NM
    CROSS JOIN
      ( SELECT MAX(ID) AS maxID 
        FROM T
      ) AS m

INSERT INTO T 
    (ID, COL1, COL2) 
  SELECT 
      ID+maxID, COL1, COL2 
  FROM 
      T
    CROSS JOIN
      ( SELECT MAX(ID) AS maxID 
        FROM T
      ) AS m

Step 3. Re-enable FKs.

0人赞添加讨论(0) 举报

倾城　Initia

3楼-- · 2019-05-16 21:52

If you are lucky enough to run the current PostgreSQL 9.1, there is an elegant and fast solution with a single command using the new data-modifying CTEs.

No such luck with MySQL which does not support Common Table Expressions (CTE), not to mention data-modifying CTE.

Assuming (col1, col2) is initially unique:

Query 1

You can easily pick arbitrary slices from the table in this case.
No sequence numbers for t.id will be wasted.

WITH s AS (
    SELECT id, col1, col2
    FROM   t
--  WHERE  some condition
    )
    ,i AS (
    INSERT INTO t (col1, col2)
    SELECT col1, col2   -- I gather from comments that id is a serial column
    FROM   s
    RETURNING id, col1, col2
    )
INSERT INTO tu (t, u)
SELECT i.id, tu.u
FROM   tu
JOIN   s ON tu.t = s.id
JOIN   i USING (col1, col2);

If (col1, col2) is not unique, I see two other ways:

Query 2

Use the window function row_number() to make non-unique rows unique.
INSERT rows without holes in the t.id space just like in the query above.

WITH s AS (
    SELECT id, col1, col2
         , row_number() OVER (PARTITION BY col1, col2) AS rn
    FROM   t
--  WHERE some condition
    )
    ,i AS (
    INSERT INTO t (col1, col2)
    SELECT col1, col2
    FROM   s
    RETURNING id, col1, col2
    )
    ,r AS (
    SELECT *
         , row_number() OVER (PARTITION BY col1, col2) AS rn
    FROM   i
    )
INSERT INTO tu (t, u)
SELECT r.id, tu.u
FROM   r
JOIN   s USING (col1, col2, rn)    -- match exactly one id per row
JOIN   tu ON tu.t = s.id;

Query 3

This is based on the same idea that @ypercube already supplied, but all in one query.
If there are holes in numbers space for current t.id, sequence numbers will be burnt for the new rows accordingly.
Don't forget to reset your sequence beyond the new maximum or you will get duplicate key errors for new inserts in t that draw the default for id from the sequence. I integrated this as final step into the command. Fastest & safest this way.

WITH s AS (
    SELECT max(id) AS max_id
    FROM   t
    )
    ,i AS (
    INSERT INTO t (id, col1, col2)
    SELECT id + s.max_id, col1, col2
    FROM   t, s
    )
    ,j AS (
    INSERT INTO tu (t, u)
    SELECT tu.t + s.max_id, tu.u
    FROM   tu, s
    )
SELECT setval('t_id_seq', s.max_id + s.max_id)
FROM   s;

Details about setval() in the manual.

Test setup

For a quick test.

CREATE TEMP TABLE t (id serial primary key, col1 text, col2 text);
INSERT INTO t (col1, col2) VALUES 
 ('A', 'B')
,('C', 'D');

CREATE TEMP TABLE tu (t int, u int);
INSERT INTO tu VALUES
 (1, 100)
,(1, 101)
,(2, 100)
,(2, 102);

SELECT * FROM t;
SELECT * FROM tu;

There was a somewhat similar question recently, where I provided a somewhat similar answer. Plus alternatives for version 8.3 without CTEs and window functions.

0人赞添加讨论(0) 举报