Which would be fastest, 1x insert 512 rows, 4x ins

2019-07-31 17:11发布

问题:

I've got 512 rows to insert to a database. I'm wondering if there is any advantage to submitting multiple inserts over one large insert. For example

1x 512 row insert --

INSERT INTO mydb.mytable (id, phonenumber)
VALUES (1, 555-555-5555) , (2, 555-555-5555) , (3, 555-555-5555), //repeat to id = 512

VS 4x 128 row insert

INSERT INTO mydb.mytable (id, phonenumber)
VALUES (1, 555-555-5555) , (2, 555-555-5555) , (3, 555-555-5555), //repeat to id = 128
INSERT INTO mydb.mytable (id, phonenumber)
VALUES (129, 555-555-5555) , (130, 555-555-5555) , (131, 555-555-5555), //repeat to id = 256, then next 128, then next 128.

VS 512x 1 row insert

INSERT INTO mydb.mytable (id, phonenumber)
VALUES (1, 555-555-5555)
INSERT INTO mydb.mytable (id, phonenumber)
VALUES (2, 555-555-5555) // repeat until id = 512

And a question about testing this, if I were to set this up as a test - Say I do the first approach, one large 512 row insert. That takes 0.5 seconds. Then the next time it takes 0.3 seconds -- does this caching that I think will happen, like it does when a programming language performs the same action twice, happen in sql? (Thusly would it be neccesary to get a good average of test results for each approach?)

What other considerations should I take when doing extremely large inserts (say, half a million)? Is it true that if the packet sent to the database is too large it will never recieve or execute the query -- Will I even run into trouble ever for making a very large insert?

回答1:

My answer is assuming SQL Server; I suspect what I say would be applicable to other SQL engines.

Much of the overhead of any SQL Server query is development of an execution plan. If you do this as a single insert, it will have to develop the execution plan one (1) time; if you do 512 seperate inserts, it will have to develop the execution plan 512 times. So it is considerably less overhead to do a single insert.

I wouldn't be surprised to find that the engine finds other efficiencies that you either wouldn't know to do, wouldn't think to do, or wouldn't be able to do. But if it was only the execution plan savings, it's still worth doing in a single insert.



回答2:

The answer is likely to vary based on which RDBMS product you're using. One can't make a fine-grained optimization plan in an implementation-agnostic way.

But you can make broad observations, for example it's better to remove loop-invariant code.

In the case of a loop of many INSERTs to the same table, you can make an educated guess that the loop invariants are things like SQL parsing and query execution planning. Some optimizer implementations may cache the query execution plan, some other implementations don't.

So we can assume that a single INSERT of 512 rows is likely to be more efficient. Again, your mileage may vary in a given implementation.

As for loading millions of rows, you should really consider bulk-loading tools. Most RDBMS brands have their own special tools or non-standard SQL statements to provide efficient bulk-loading, and this can be faster than any INSERT-based solution by an order of magnitude.

  • The Data Loading Performance Guide (Microsoft SQL Server)
  • Oracle Bulk Insert tips (Oracle)
  • How to load large files safely into InnoDB with LOAD DATA INFILE (MySQL)
  • Populating a database (PostgreSQL)

So you have just wasted your time worrying about whether a single INSERT is a little bit more efficient than multiple INSERTs.



回答3:

For many databases indexing is an overhead. It is worth testing to see if it is faster to turn off indexing before doing a large insert and then re-index the table afterwards.