可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a script that generates tens of thousands of inserts into a postgres db through a custom ORM. As you can imagine, it's quite slow. This is used for development purposes in order to create dummy data. Is there a simple optimization I can do at the Postgres level to make this faster? It's the only script running, sequentially, and requires no thread safety.
Perhaps I can turn off all locking, safety checks, triggers, etc? Just looking for a quick and dirty solution that will greatly speed up this process.
Thanks.
回答1:
If you don't need that kind of functionality in production environment, I'd suggest you turn fsync off from your PostgreSQL config. This will speed up the inserts dramatically.
Never turn off fsync on a production database.
回答2:
The fastest way to insert data would be the COPY
command.
But that requires a flat file as its input. I guess generating a flat file is not an option.
Don't commit too often, especially do not run this with autocommit enabled.
"Tens of thousands" sounds like a single commit at the end would be just right.
If you can convice your ORM to make use of Postgres' multi-row insert that would speed up things as well
This is an example of a multi-row insert:
insert into my_table (col1, col2)
values
(row_1_col_value1, row_1_col_value_2),
(row_2_col_value1, row_2_col_value_2),
(row_3_col_value1, row_3_col_value_2)
If you can't generate the above syntax and you are using Java make sure you are using batched statements instead of single statement inserts (maybe other DB layers allow something similar)
Edit:
jmz' post inspired me to add something:
You might also see an improvement when you increase wal_buffers
to some bigger value (e.g. 8MB) and checkpoint_segments
(e.g. 16)
回答3:
For inserts that number in the hundreds to thousands, batch them:
begin;
insert1 ...
insert2 ...
...
insert10k ...
commit;
For inserts in the millions use copy:
COPY test (ts) FROM stdin;
2010-11-29 22:32:01.383741-07
2010-11-29 22:32:01.737722-07
... 1Million rows
\.
Make sure any col used as an fk in another table is indexed if it's more than trivial in size in the other table.
回答4:
One thing you can do is remove all indexs, do your inserts, and then recreate the indexes.
回答5:
Are you sending a batch of tens of thousands of INSERTs OR are you sending tens of thousands of INSERTs?
I know with Hibernate you can batch all your SQL statements up and send them at the end in one big chunk instead of taking the tax of network and database overhead of making thousands of SQL statements individually.
回答6:
If you are just initializing constant test data, you could also put the test data into a staging table(s), then just copy the table contents, using
INSERT INTO... SELECT...
that should be about as fast as using COPY (though I did not benchmark it), with the advantage that you can copy using just SQL commands, without the hassle of setting up an external file like for COPY.
回答7:
Try to do as much as possible in one request!
insert into my_table (col1, col2)
values (
unnest(array[row_1_col_value_1, row_2_col_value_1, row3_col_value_1]),
unnest(array[row_1_col_value_2, row_2_col_value_2, row_3_col_value_2));
This resembles the suggestion of @a_horse_with_no_name. The advantage of using unnest
is: You can use query parameters that contain arrays!
insert into my_table (col1, col2)
values (unnest(:col_values_1), unnest(:col_values_2));
By collapsing three insert
statements into one, you save more than 50% of execution time.
And by using query parameters with 2000 values in a single Insert
, I get a speed factor of 150 in my application.