I need to insert a couple hundreds of millions of records into the mysql db. I\'m batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = \"INSERT INTO mytable (xxx), VALUES(?)\";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append(\"inserted \"+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection(\"jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true\", \"username\", \"password\");
I\'d like to expand on Bertil\'s answer, as I\'ve been experimenting with the connection URL parameters.
rewriteBatchedStatements=true
is the important parameter. useServerPrepStmts
is already false by default, and even changing it to true doesn\'t make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true
improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch()
(Source). That means that instead of sending the following n
INSERT statements to the mysql server each time executeBatch()
is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1
) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3);
, you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3);
(It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don\'t know if that\'s still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
- It\'s a new table, or the amount to be inserted is greater then the already inserted data
- There are indexes on the table
- You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS
can greatly improve the speed of your inserts. When you\'re done, run ALTER TABLE tbl_name ENABLE KEYS
to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = \"INSERT INTO mytable (xxx), VALUES(?)\";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append(\"inserted \"+count);