Copy datasets with n:m-relation

2019-05-16 21:12发布

I would like to use the single SQL-statement

insert into T (...) select ... from T where ...

to copy a lot of datasets. My problem is that there are N:M-relationships from table T to other tables and these have to be copied too. How can I do this, if I do not know which original dataset belongs to which copied dataset? Let me demonstrate by example.

Content of the database before:

Table T:

ID  | COL1 | COL2    
-----------------
1   | A    | B
2   | C    | D

N:M-table references table U from table T (table U is not shown):

T   | U              
---------
1   | 100
1   | 101
2   | 100
2   | 102

My copy operation where [???] is the part I do not know:

insert into T (COL1, COL2) select COL1, COL2 from T
insert into NM (T, U) select [???]

Content of the database after:

Table T:

ID  | COL1 | COL2
-----------------
1   | A    | B
2   | C    | D
3   | A    | B
4   | C    | D

N:M-table:

T   | U
---------
1   | 100
1   | 101
2   | 100
2   | 102
3   | 100
3   | 101
4   | 100
4   | 102

Notice:

  • I have thousands of datasets (not just two)
  • I want to use 'insert ... select' to get a better performance

2条回答
女痞
2楼-- · 2019-05-16 21:45

Step 1. Lock (both) tables or make sure that only this script is running. Disable FK checks.

Step 2. Use these two INSERT statements, in this order:

INSERT INTO NM 
    (T, U) 
  SELECT 
      T + maxID, U
  FROM 
      NM
    CROSS JOIN
      ( SELECT MAX(ID) AS maxID 
        FROM T
      ) AS m

INSERT INTO T 
    (ID, COL1, COL2) 
  SELECT 
      ID+maxID, COL1, COL2 
  FROM 
      T
    CROSS JOIN
      ( SELECT MAX(ID) AS maxID 
        FROM T
      ) AS m

Step 3. Re-enable FKs.

查看更多
倾城 Initia
3楼-- · 2019-05-16 21:52

If you are lucky enough to run the current PostgreSQL 9.1, there is an elegant and fast solution with a single command using the new data-modifying CTEs.

No such luck with MySQL which does not support Common Table Expressions (CTE), not to mention data-modifying CTE.

Assuming (col1, col2) is initially unique:

Query 1

  • You can easily pick arbitrary slices from the table in this case.
  • No sequence numbers for t.id will be wasted.

WITH s AS (
    SELECT id, col1, col2
    FROM   t
--  WHERE  some condition
    )
    ,i AS (
    INSERT INTO t (col1, col2)
    SELECT col1, col2   -- I gather from comments that id is a serial column
    FROM   s
    RETURNING id, col1, col2
    )
INSERT INTO tu (t, u)
SELECT i.id, tu.u
FROM   tu
JOIN   s ON tu.t = s.id
JOIN   i USING (col1, col2);

If (col1, col2) is not unique, I see two other ways:

Query 2

  • Use the window function row_number() to make non-unique rows unique.
  • INSERT rows without holes in the t.id space just like in the query above.

WITH s AS (
    SELECT id, col1, col2
         , row_number() OVER (PARTITION BY col1, col2) AS rn
    FROM   t
--  WHERE some condition
    )
    ,i AS (
    INSERT INTO t (col1, col2)
    SELECT col1, col2
    FROM   s
    RETURNING id, col1, col2
    )
    ,r AS (
    SELECT *
         , row_number() OVER (PARTITION BY col1, col2) AS rn
    FROM   i
    )
INSERT INTO tu (t, u)
SELECT r.id, tu.u
FROM   r
JOIN   s USING (col1, col2, rn)    -- match exactly one id per row
JOIN   tu ON tu.t = s.id;

Query 3

  • This is based on the same idea that @ypercube already supplied, but all in one query.
  • If there are holes in numbers space for current t.id, sequence numbers will be burnt for the new rows accordingly.
  • Don't forget to reset your sequence beyond the new maximum or you will get duplicate key errors for new inserts in t that draw the default for id from the sequence. I integrated this as final step into the command. Fastest & safest this way.

WITH s AS (
    SELECT max(id) AS max_id
    FROM   t
    )
    ,i AS (
    INSERT INTO t (id, col1, col2)
    SELECT id + s.max_id, col1, col2
    FROM   t, s
    )
    ,j AS (
    INSERT INTO tu (t, u)
    SELECT tu.t + s.max_id, tu.u
    FROM   tu, s
    )
SELECT setval('t_id_seq', s.max_id + s.max_id)
FROM   s;

Details about setval() in the manual.

Test setup

For a quick test.

CREATE TEMP TABLE t (id serial primary key, col1 text, col2 text);
INSERT INTO t (col1, col2) VALUES 
 ('A', 'B')
,('C', 'D');

CREATE TEMP TABLE tu (t int, u int);
INSERT INTO tu VALUES
 (1, 100)
,(1, 101)
,(2, 100)
,(2, 102);

SELECT * FROM t;
SELECT * FROM tu;

There was a somewhat similar question recently, where I provided a somewhat similar answer. Plus alternatives for version 8.3 without CTEs and window functions.

查看更多
登录 后发表回答