I work on a system which downloads data from a cloud system to a local database (PostgreSQL, MySQL, ...). Now I'm having an issue with PostgreSQL performance because it takes a lot of time to insert the data.
A number of columns and the size of the data may vary. In a sample project, I have a table with approx. 170 columns. There is one unique index - but even after dropping the index the speed of the insert did not change.
I'm using JDBC driver to connect to the database and I'm inserting data in batches of 250 rows (using NamedParameterJdbcTemplate).
It took me approx. 18 seconds to insert the data on Postgres. The same data set on MySQL took me just a second. That's a huge difference - where does it come from? Is Postgres JDBC driver that slow? Can it be configured somehow to make it faster? Am I missing something else? The difference between Postgres and MySQL is so huge. Any other ideas how to make it faster?
I made a sample project which is available on Github - https://github.com/varad/postgresql-vs-mysql. Everything happens in LetsGo class in the "run" method.
It seems that this is a combination of a Spring "bug" and a driver "bug".
Spring tries to determine the data type of a column each time setValue()
is called. It does this by calling PreparedStatementMetaData.getParameterMetaData()
This apparently causes a "prepare" statement to be sent to the database which in itself is quite fast (never more then 1ms on my laptop) but as it is called for each column for each row this sums up to a lot of time (it's called for every not-null value which results in approx. 23.000 calls)
To some extent this is more a Spring bug then a driver bug because not caching the parameter meta data doesn't really make sense (at least in my opinion). The MySQL JDBC driver does not support getParameterMetaData()
and Spring knows this and so this "bug" doesn't show up with MySQL because spring never calls that method.
I am not sure if Postgres' JDBC driver behavior can be classified as a bug, but it sure would be nice if the driver was caching that meta data after the first call.
Spring can be convinced to not obtain the statement meta data through the property spring.jdbc.getParameterType.ignore
So by putting:
System.setProperty("spring.jdbc.getParameterType.ignore", "true");
before the line:
LetsGo letsGo = new LetsGo();
this behaviour is disabled.
The property must be set before Spring is initialized.
When I do that with your sample project, the insert runs in 500ms on my laptop.
Edit
After seeing the comment regarding the use of the Postgres-NG driver I dug into the sources of the "official" driver and the NG driver, and the NG driver does cache the parameter meta data after the first call whereas the official driver does not which explains why using the NG driver is so much faster (without disabling the call in Spring)
try using pgjdbc-ng driver and then compare your results.
It is available here:
http://impossibl.github.io/pgjdbc-ng/
I hope you are using DB Connection Pool. You can try C3P0. Spring (JDBCTemplate) doesn't provide Connection Pool implementation.