Compare two tables with minus operation in oracle

2019-07-28 18:35发布

Some tables' data need to be updated(or deleted , inserted) in my system. But I want to know which data are updated,deleted and inserted. So before the data are changed ,I will backup the table in different schema just like this:

create table backup_table as select * from schema1.testtable

and after the data are changed,I want to find the difference between backup_table and testtable ,and I want to save the difference into a table in the backup schema.

the sql I will run is like this:

 CREATE TABLE TEST_COMPARE_RESULT
 AS
 SELECT 'BEFORE' AS STATUS, T1.*
 FROM (
        SELECT * FROM backup_table
        MINUS
        SELECT * FROM schema1.testtable
 ) T1
 UNION ALL
 SELECT 'AFTER' AS STATUS, T2.*
 FROM (
        SELECT * FROM schema1.testtable
        MINUS
        SELECT * FROM backup_table
  ) T2

What I am worried about is that I heared about that the minus operation will use a lot of system resource.In my sysytem, some table size will be over 700M .So I want to know how oracle will read the 700M data in memory (PGA??) or the temporary tablespace? and How I should make sure that the resource are enough to to the compare operation?

1条回答
一夜七次
2楼-- · 2019-07-28 18:53

Minus is indeed a resource intensive task. It need to read both tables and do sorts to compare the two tables. However, Oracle has advanced techniques to do this. It won't load the both tables in memory(SGA) if can't do it. It will use, yes, temporary space for sorts. But I would recommend you to have a try. Just run the query and see what happens. The database won't suffer and allways you can stop the execution of statement.

What you can do to improve the performance of the query:

First, if you have columns that you are sure that won't changed, don't include them. So, is better to write:

select a, b from t1
minus 
select a, b from t2

than using a select * from t, if there are more than these two columns, because the work is lesser.

Second, if the amount of data to compare si really big for your system(too small temp space), you should try to compare them on chunks:

select a, b from t1 where col between val1 and val2
minus 
select a, b from t2 where col between val1 and val2 

Sure, another possibility than minus is to have some log columns, let's say updated_date. selecting with where updated_date greater than start of process will show you updated records. But this depends on how you can alter the database model and etl code.

查看更多
登录 后发表回答