Update MySQL from CSV using JAVA

2020-04-08 15:01发布

问题:

my problem is as follows:

  1. The CSV file is downloaded at given intervals from server.

  2. File has to be parsed (there are unnecessary spaces that have to be stripped).

  3. Item price, rrp and stock quantity must be updated for each barcode.

Each CSV line contains article number, product title, warehouse id, warehouse name, price, rrp price, stock and barcode. The items table contains about 71000 rows. And barcode is not unique key in database (because item with same barcode can be in different warehouses). Issue is that it takes more than one hour to update on localhost MySQL server (about half hour to SQLite). Is there any way to optimize SQL query to make things faster? My current code looks like this:

    public void updateData (BufferedReader bufferedReader, Connection sqlConnection){
    String csvLine = null;
    PreparedStatement preparedStatement = null;
    String sqlString = "UPDATE items SET price = ?, rrp = ?, stock = ? WHERE departmentid = ? AND barcode = ?";
    try {
        preparedStatement = sqlConnection.prepareStatement(sqlString);
        while ((csvLine = bufferedReader.readLine()) != null) {
            String[] splitLine = csvLine.split(";");
            preparedStatement.setBigDecimal(1, new BigDecimal(splitLine[4].trim()).setScale(2, RoundingMode.CEILING));
            preparedStatement.setBigDecimal(2, new BigDecimal(splitLine[5].trim()).setScale(2, RoundingMode.CEILING));
            preparedStatement.setInt(3, Integer.parseInt(splitLine[6].trim()));
            preparedStatement.setString(4, splitLine[2].trim());
            preparedStatement.setString(5, splitLine[8].trim());
            preparedStatement.executeUpdate();
        }
    } catch (IOException | SQLException exc) {
        System.out.println(exc.getMessage());
    } finally {
        try {
            sqlConnection.commit();
            preparedStatement.close();
            sqlConnection.close();
        } catch (SQLException exc) {
            exc.printStackTrace();
        }
    }
}

So far fastest solution looks as suggested by @e4c5 with LOAD csv data to temporary table and using query: UPDATE items INNER JOIN temp_table ON items.barcode = temp_table.barcode SET items.rrp = temp_table.rrp, items.price = temp_table.price, items.stock = temp_table.stock WHERE items.barcode = temp_table.barcode AND items.departmentid = temp_table.departmentid. Any way to make this even faster?

回答1:

I think the best way in your case is to use Statement batching here is an example :

sqlConnection.setAutoCommit(false);//<<------------
try {
    preparedStatement = sqlConnection.prepareStatement(sqlString);
    while ((csvLine = bufferedReader.readLine()) != null) {
        String[] splitLine = csvLine.split(";");
        preparedStatement.setBigDecimal(1, new BigDecimal(splitLine[4].trim()).setScale(2, RoundingMode.CEILING));
        preparedStatement.setBigDecimal(2, new BigDecimal(splitLine[5].trim()).setScale(2, RoundingMode.CEILING));
        preparedStatement.setInt(3, Integer.parseInt(splitLine[6].trim()));
        preparedStatement.setString(4, splitLine[2].trim());
        preparedStatement.setString(5, splitLine[8].trim());

        preparedStatement.addBatch();//<<-----------add a batch
    }

    //execute your multiple statement as one
    statement.executeBatch();//<<------------
    sqlConnection.commit();//<<--------------
}

EDIT

Like @Mick Mnemonic mention in comment :

You could try if splitting into smaller batches of say 500 rows makes any difference

So instead to execute your batch in one shot you can split your batch in small batches for example :

sqlConnection.setAutoCommit(false);
try {
    int nbrBatch = 500;
    int count = 0;
    preparedStatement = sqlConnection.prepareStatement(sqlString);
    while ((csvLine = bufferedReader.readLine()) != null) {
        //Your code here
        preparedStatement.addBatch();
        if (count % nbrBatch == 0) {
            statement.executeBatch();
        }
        count++;
    }
    //rest of your batch not executed
    statement.executeBatch();
    sqlConnection.commit();
}


回答2:

There is a much faster and much better way to import your data without a single line of Java code. That is with LOAD DATA INFILE

note that LOAD DATA can do some amount of preprocessing for your CSV. However it is not adequate in some situation. Then you would have to get help from java but java shouldn't be doing the insert.

Your java code should preprocess the CSV and produce another CSV which can be passed to LOAD DATA. 70000 records is a breeze.

LOAD DATA does not support the ON DUPLICATE KEY UPDATE syntax. But it does have the REPLACE syntax (which is slightly slower than duplicate key update, but it would still be at least an order of magnitude faster than what java or any other programming language can give you).

If doing an actually update is a matter of importance. LOAD DATA into a temporary table and then to

INSERT INTO mytable SELECT * FROM temp_table ON DUPLICATE KEY ....