INSERT of 10 million queries under 10 minutes in O

2019-02-04 15:06发布

问题:

I am working on a file loader program.

The purpose of this program is to take an input file, do some conversions on its data and then upload the data into the database of Oracle.

The problem that I am facing is that I need to optimize the insertion of very large input data on Oracle.

I am uploading data into the table, lets say ABC.

I am using the OCI library provided by Oracle in my C++ Program. In specific, I am using OCI Connection Pool for multi-threading and loading into ORACLE. (http://docs.oracle.com/cd/B28359_01/appdev.111/b28395/oci09adv.htm )

The following are the DDL statements that have been used to create the table ABC –

CREATE TABLE ABC(
   seq_no         NUMBER NOT NULL,
   ssm_id         VARCHAR2(9)  NOT NULL,
   invocation_id  VARCHAR2(100)  NOT NULL,
   analytic_id    VARCHAR2(100) NOT NULL,
   analytic_value NUMBER NOT NULL,
   override       VARCHAR2(1)  DEFAULT  'N'   NOT NULL,
   update_source  VARCHAR2(255) NOT NULL,
   last_chg_user  CHAR(10)  DEFAULT  USER NOT NULL,
   last_chg_date  TIMESTAMP(3) DEFAULT  SYSTIMESTAMP NOT NULL
);

CREATE UNIQUE INDEX ABC_indx ON ABC(seq_no, ssm_id, invocation_id, analytic_id);
/
CREATE SEQUENCE ABC_seq;
/

CREATE OR REPLACE TRIGGER ABC_insert
BEFORE INSERT ON ABC
FOR EACH ROW
BEGIN
SELECT ABC_seq.nextval INTO :new.seq_no FROM DUAL;
END;

I am currently using the following Query pattern to upload the data into the database. I am sending data in batches of 500 queries via various threads of OCI connection pool.

Sample of SQL insert query used -

insert into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source)
select 'c','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'a','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'b','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'c','g',NULL, 'test', 123 , 'N', 'asdf' from dual

EXECUTION PLAN by Oracle for the above query -

-----------------------------------------------------------------------------
| Id  | Operation                | Name|Rows| Cost (%CPU) | Time     |
-----------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |     | 4  |     8   (0) | 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | ABC |    |             |          |
|   2 |   UNION-ALL              |     |    |             |          |
|   3 |    FAST DUAL             |     | 1  |     2   (0) | 00:00:01 |
|   4 |    FAST DUAL             |     | 1  |     2   (0) | 00:00:01 |
|   5 |    FAST DUAL             |     | 1  |     2   (0) | 00:00:01 |
|   6 |    FAST DUAL             |     | 1  |     2   (0) | 00:00:01 |

The Run times of the program loading 1 million lines -

Batch Size = 500
Number of threads - Execution Time -
10                  4:19
20                  1:58
30                  1:17
40                  1:34
45                  2:06
50                  1:21
60                  1:24
70                  1:41
80                  1:43
90                  2:17
100                 2:06


Average Run Time = 1:57    (Roughly 2 minutes)

I need to optimize and reduce this time further. The problem that I am facing is when I put 10 million rows for uploading.

The average run time for 10 million came out to be = 21 minutes

(My target is to reduce this time to below 10 minutes)

So I tried the following steps as well -

[1] Did the partitioning of the table ABC on the basis of seq_no. Used 30 partitions. Tested with 1 million rows - The performance was very poor. almost 4 times more than the unpartitioned table.

[2] Another partitioning of the table ABC on the basis of last_chg_date. Used 30 partitions.

2.a) Tested with 1 million rows - The performance was almost equal to the unpartitioned table. Very little difference was there so it was not considered.

2.b) Again tested the same with 10 million rows. The performance was almost equal to the unpartitioned table. No noticable difference.

The following was the DDL commands were used to achieve partitioning -

CREATE TABLESPACE ts1 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts2 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts3 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts4 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts5 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts6 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts7 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts8 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts9 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts10 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts11 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts12 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts13 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts14 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts15 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts16 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts17 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts18 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts19 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts20 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts21 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts22 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts23 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts24 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts25 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts26 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts27 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts28 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts29 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts30 DATAFILE AUTOEXTEND ON;

CREATE TABLE ABC(
   seq_no           NUMBER NOT NULL,
   ssm_id           VARCHAR2(9)  NOT NULL,
   invocation_id    VARCHAR2(100)  NOT NULL,
   calc_id          VARCHAR2(100) NULL,
   analytic_id      VARCHAR2(100) NOT NULL,
   ANALYTIC_VALUE   NUMBER NOT NULL,
   override         VARCHAR2(1)  DEFAULT  'N'   NOT NULL,
   update_source    VARCHAR2(255) NOT NULL,
   last_chg_user    CHAR(10)  DEFAULT  USER NOT NULL,
   last_chg_date    TIMESTAMP(3) DEFAULT  SYSTIMESTAMP NOT NULL
)
PARTITION BY HASH(last_chg_date)
PARTITIONS 30
STORE IN (ts1, ts2, ts3, ts4, ts5, ts6, ts7, ts8, ts9, ts10, ts11, ts12, ts13,
ts14, ts15, ts16, ts17, ts18, ts19, ts20, ts21, ts22, ts23, ts24, ts25, ts26,
ts27, ts28, ts29, ts30);

CODE that I am using in the thread function (written in C++), using OCI -

void OracleLoader::bulkInsertThread(std::vector<std::string> const & statements)
{

    try
    {
        INFO("ORACLE_LOADER_THREAD","Entered Thread = %1%", m_env);
        string useOraUsr = "some_user";
        string useOraPwd = "some_password";

        int user_name_len   = useOraUsr.length();
        int passwd_name_len = useOraPwd.length();

        text* username((text*)useOraUsr.c_str());
        text* password((text*)useOraPwd.c_str());


        if(! m_env)
        {
            CreateOraEnvAndConnect();
        }
        OCISvcCtx *m_svc = (OCISvcCtx *) 0;
        OCIStmt *m_stm = (OCIStmt *)0;

        checkerr(m_err,OCILogon2(m_env,
                                 m_err,
                                 &m_svc,
                                 (CONST OraText *)username,
                                 user_name_len,
                                 (CONST OraText *)password,
                                 passwd_name_len,
                                 (CONST OraText *)poolName,
                                 poolNameLen,
                                 OCI_CPOOL));

        OCIHandleAlloc(m_env, (dvoid **)&m_stm, OCI_HTYPE_STMT, (size_t)0, (dvoid **)0);

////////// Execution Queries in the format of - /////////////////
//        insert into pm_own.sec_analytics (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value, override, update_source)
//        select 'c','b',NULL, 'test', 123 , 'N', 'asdf' from dual
//        union all select 'a','b',NULL, 'test', 123 , 'N', 'asdf' from dual
//        union all select 'b','b',NULL, 'test', 123 , 'N', 'asdf' from dual
//        union all select 'c','g',NULL, 'test', 123 , 'N', 'asdf' from dual
//////////////////////////////////////////////////////////////////

        size_t startOffset = 0;
        const int batch_size = PCSecAnalyticsContext::instance().getBatchCount();
        while (startOffset < statements.size())
        {
            int remaining = (startOffset + batch_size < statements.size() ) ? batch_size : (statements.size() - startOffset );
            // Break the query vector to meet the batch size
            std::vector<std::string> items(statements.begin() + startOffset,
                                           statements.begin() + startOffset + remaining);

            //! Preparing the Query
            std::string insert_query = "insert into ";
            insert_query += Context::instance().getUpdateTable();
            insert_query += " (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value, override, update_source)\n";

            std::vector<std::string>::const_iterator i3 = items.begin();
            insert_query += *i3 ;

            for( i3 = items.begin() + 1; i3 != items.end(); ++i3)
                insert_query += "union " + *i3 ;
            // Preparing the Statement and Then Executing it in the next step
            text *txtQuery((text *)(insert_query).c_str());
            checkerr(m_err, OCIStmtPrepare (m_stm, m_err, txtQuery, strlen((char *)txtQuery), OCI_NTV_SYNTAX, OCI_DEFAULT));
            checkerr(m_err, OCIStmtExecute (m_svc, m_stm, m_err, (ub4)1, (ub4)0, (OCISnapshot *)0, (OCISnapshot *)0, OCI_DEFAULT ));

            startOffset += batch_size;
        }

        // Here is the commit statement. I am committing at the end of each thread.
        checkerr(m_err, OCITransCommit(m_svc,m_err,(ub4)0));

        checkerr(m_err, OCIHandleFree((dvoid *) m_stm, OCI_HTYPE_STMT));
        checkerr(m_err, OCILogoff(m_svc, m_err));

        INFO("ORACLE_LOADER_THREAD","Thread Complete. Leaving Thread.");
    }

    catch(AnException &ex)
    {
        ERROR("ORACLE_LOADER_THREAD", "Oracle query failed with : %1%", std::string(ex.what()));
        throw AnException(string("Oracle query failed with : ") + ex.what());
    }
}

While the post was being answered, I was suggested several methods to optimize my INSERT QUERY. I have chosen and used QUERY I in my program for the following reasons that I discovered while testing the various INSERT Queries. On running the SQL Queries that were suggested to me - QUERY I -

insert into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source)
select 'c','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'a','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'b','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'c','g',NULL, 'test', 123 , 'N', 'asdf' from dual

EXECUTION PLAN by Oracle for Query I -

--------------------------------------------------------------------------
| Id  | Operation                | Name| Rows | Cost (%CPU)   | Time     |
--------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |     |  4   | 8   (0)       | 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | ABC |      |               |          |
|   2 |   UNION-ALL              |     |      |               |          |
|   3 |    FAST DUAL             |     |  1   | 2   (0)       | 00:00:01 |
|   4 |    FAST DUAL             |     |  1   | 2   (0)       | 00:00:01 |
|   5 |    FAST DUAL             |     |  1   | 2   (0)       | 00:00:01 |
|   6 |    FAST DUAL             |     |  1   | 2   (0)       | 00:00:01 |

QUERY II -

insert all
into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source) values ('c','b',NULL, 'test', 123 , 'N', 'asdf')
into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source) values ('c','e',NULL, 'test', 123 , 'N', 'asdf')
into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source) values ('c','r',NULL, 'test', 123 , 'N', 'asdf')
into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source) values ('c','t',NULL, 'test', 123 , 'N', 'asdf')
select 1 from dual

EXECUTION PLAN by Oracle for Query II -

-----------------------------------------------------------------------------
| Id  | Operation           | Name| Rows  | Cost (%CPU)   | Time     |
-----------------------------------------------------------------------------
|   0 | INSERT STATEMENT    |     | 1     |     2   (0)   | 00:00:01 |
|   1 |  MULTI-TABLE INSERT |     |       |               |          |
|   2 |   FAST DUAL         |     | 1     |     2   (0)   | 00:00:01 |
|   3 |   INTO              | ABC |       |               |          |
|   4 |   INTO              | ABC |       |               |          |
|   5 |   INTO              | ABC |       |               |          |
|   6 |   INTO              | ABC |       |               |          |

As per the experiments the Query I is faster.

Here I tested on both Oracle SQL Developer , as well as I sent insert queries by my C++ program (FILELOADER) as well.

On Further reading about it, I found out that the cost shown by the Execution Plan is the number of CPU the query will use to process itself. That tells that Oracle will use more CPU to process the first query and that is why its cost goes on to be = 8.

Even by using the same insert pattern via my application, I found out that its performance it almost 1.5 times better.

I need some insight on how I can improve the performance even further..? All the things that I have tried, I have summarized them in my question. If I find or discover anything relevant, I will add to this question.

My target in to bring the upload time of 10 million queries under 10 minutes.

回答1:

I know others have mentioned this and you don't want to hear it but use SQL*Loader or external tables. My average load time for tables of approximately the same width is 12.57 seconds for just over 10m rows. These utilities have been explicitly designed to load data into the database quickly and are pretty good at it. This may incur some additional time penalties depending on the format of your input file, but there are quite a few options and I've rarely had to change files prior to loading.

If you're unwilling to do this then you don't have to upgrade your hardware yet; you need to remove every possible impediment to loading this quickly. To enumerate them, remove:

  1. The index
  2. The trigger
  3. The sequence
  4. The partition

With all of these you're obliging the database to perform more work and because you're doing this transactionally, you're not using the database to its full potential.

Load the data into a separate table, say ABC_LOAD. After the data has been completely loaded perform a single INSERT statement into ABC.

insert into abc
select abc_seq.nextval, a.*
  from abc_load a

When you do this (and even if you don't) ensure that the sequence cache size is correct; to quote:

When an application accesses a sequence in the sequence cache, the sequence numbers are read quickly. However, if an application accesses a sequence that is not in the cache, then the sequence must be read from disk to the cache before the sequence numbers are used.

If your applications use many sequences concurrently, then your sequence cache might not be large enough to hold all the sequences. In this case, access to sequence numbers might often require disk reads. For fast access to all sequences, be sure your cache has enough entries to hold all the sequences used concurrently by your applications.

This means that if you have 10 threads concurrently writing 500 records each using this sequence then you need a cache size of 5,000. The ALTER SEQUENCE document states how to change this:

alter sequence abc_seq cache 5000

If you follow my suggestion I'd up the cache size to something around 10.5m.

Look into using the APPEND hint (see also Oracle Base); this instructs Oracle to use a direct-path insert, which appends data directly to the end of the table rather than looking for space to put it. You won't be able to use this if your table has indexes but you could use it in ABC_LOAD

insert /*+ append */ into ABC (SSM_ID, invocation_id , calc_id, ... )
select 'c','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'a','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'b','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'c','g',NULL, 'test', 123 , 'N', 'asdf' from dual

If you use the APPEND hint; I'd add TRUNCATE ABC_LOAD after you've inserted into ABC otherwise this table will grow indefinitely. This should be safe as you will have finished using the table by then.

You don't mention what version or edition or Oracle you're using. There are a number of extra little tricks you can use:

  • Oracle 12c

    This version supports identity columns; you could get rid of the sequence completely.

    CREATE TABLE ABC(
       seq_no         NUMBER GENERATED AS IDENTITY (increment by 5000)
    
  • Oracle 11g r2

    If you keep the trigger; you can assign the sequence value directly.

    :new.seq_no := ABC_seq.nextval;
    
  • Oracle Enterprise Edition

    If you're using Oracle Enterprise you can speed up the INSERT from ABC_LOAD by using the PARALLEL hint:

    insert /*+ parallel */ into abc
    select abc_seq.nextval, a.*
      from abc_load a
    

    This can cause it's own problems (too many parallel processes etc), so test. It might help for the smaller batch inserts but it's less likely as you'll lose time computing what thread should process what.


tl;dr

Use the utilities that come with the database.

If you can't use them then get rid of everything that might slow the insert down and do it in bulk, 'cause that's what the database is good at.



回答2:

If you have a text file you should try SQL LOADER with direct path. It is really fast and it is designed for this kind of massive data loads. Have a look at this options that can improve the performance.

As a secondary advantage for ETL, your file in clear text will be smaller and easier to audit than 10^7 inserts.

If you need to make some transformation you can do it afterwards with oracle.



回答3:

You should try bulk insert your data. For that purpose, you can use OCI*ML. The discussion of it is here. Noteable article is here. Or you may try Oracle SQL Bulk Loader SQLLDR itself to increase your upload speed. To do that, serialize the data into csv file and call SQLLDR passing csv as an argument.

Another possible optimization is transaction strategy. Try insert all data in 1 transaction per thread/connection.

Another approach is to use MULTIPLE INSERT:

INSERT ALL
   INTO ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value, 
   override, update_source ) VALUES ('c','b',NULL, 'test', 123 , 'N', 'asdf')
   INTO ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value, 
   override, update_source ) VALUES ('a','b',NULL, 'test', 123 , 'N', 'asdf')
   INTO ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value, 
   override, update_source ) VALUES ('b','b',NULL, 'test', 123 , 'N', 'asdf')

SELECT 1 FROM DUAL;

instead insert .. union all.

Your sample data looks interindependent, that leads to inserting 1 significant row, then extending it into 4 rows with post-insert sql query.

Also, turn off all indexes before insert batch (or delete them and re-create on bulk done). Table Index reduces insert perfomance while you dont actually use it at that time (it calculates some id over every inserted row and performs corresponding operations).

Using prepared statement syntax should speed up upload routine due server would have an already parsed cached statement.

Then, optimize your C++ code: move ops out of cycle:

 //! Preparing the Query
   std::string insert_query = "insert into ";
   insert_query += Context::instance().getUpdateTable();
   insert_query += " (SSM_ID, invocation_id , calc_id, 
        analytic_id, analytic_value, override, update_source)\n";
   while (startOffset < statements.size())
   { ... }


回答4:

By the way, did you try to increase number of physical clients, not just threads? By running in a cloud on several VMs or on several physical machines. I recently read comments I think from Aerospike developers where they explain that many people are unable to reproduce their results just because they don't understand it's not that easy to make a client actually send that much queries per second (above 1M per second in their case). For instance, for their benchmark they had to run 4 clients in parallel. Maybe this particular oracle driver just is not fast enough to support more than 7-8 thousands of request per second on single machine?