Improve insert performance massively

2019-02-12 22:09发布

In my application I need to massively improve insert performance. Example: A file with about 21K records takes over 100 min to insert. There are reasons it can takes some time, like 20 min or so but over 100 min is just too long.

Data is inserted into 3 tables (many-to-many). Id's are generated from a sequence but I have already googled and set hibernate.id.new_generator_mappings = true and allocationSize + sequence increment to 1000.

Also the amount of data is not anything extraordinary at all, the file is 90 mb.

I have verified with visual vm that most of the time is spent in jdbc driver (postgresql) and hibernate. I think the issue is related to a unique constraint in the child table. The service layer makes a manual check (=SELECT) before inserting. If the record already exists, it reuses it instead of waiting for a constraint exception.

So to sum it up for the specific file there will be 1 insert per table (could be different but not for this file which is the ideal (fastest) case). That means total 60k inserts + 20k selects. Still over 100 min seems very long (yeah hardware counts and it is on a simple PC with 7200 rpm drive, no ssd or raid). However this is an improved version over a previous application (plain jdbc) on which the same insert on this hardware took about 15 min. Considering that in both cases about 4-5 min is spent on "pre-processing" the increase is massive.

Any tips who this could be improved? Is there any batch loading functionality?

2条回答
时光不老,我们不散
2楼-- · 2019-02-12 22:33

sounds like a database problem. check your tables if they use InnoDB or MyISAM, the latter is in my experience very slow with insert and is the default for new dbs. remove foreign keys as far as you can

If your problem really is related to a single unique index InnoDB could do the trick.

查看更多
冷血范
3楼-- · 2019-02-12 22:37

see

spring-data JPA: manual commit transaction and restart new one

Add entityManager.flush() and entityManager.clear() after every n-th call to save() method. If you use hibernate add hibernate.jdbc.batch_size=100 which seems like a reasonable choice.

Performance increase was > 10x, probably close to 100x.

查看更多
登录 后发表回答