performance of insert with python and sqlite3

2019-03-30 01:16发布

问题:

I'm doing big batch inserts into an SQLite3 database and I'm trying to get a sense for what sort of performance I should be expecting versus what I'm actually seeing.

My table looks like this:

cursor.execute(
            "CREATE TABLE tweets(
             tweet_hash TEXT PRIMARY KEY ON CONFLICT REPLACE,
             tweet_id INTEGER,
             tweet_text TEXT)"
        )

and my inserts look like this:

cursor.executemany("INSERT INTO tweets VALUES (?, ?, ?)", to_write)

where to_write is a list of tuples.

Currently, with about 12 million rows in the database, inserting 50 000 rows is taking me around 16 minutes, running on a 2008 macbook.

Does this sound reasonable, or is there something gross happening?

回答1:

As I understand the main reason of bad performance is time you waste to commit many SQLite transactions. What to do?

Drop the indexes, then

PRAGMA synchronous = OFF (or NORMAL)

Insert blocks of N rows (define N, try N=5000 to start). Before inserting block do

BEGIN TRANSACTION

after inserting do

COMMIT

See also http://www.sqlite.org/faq.html#q19