Update large sqlite database in chunks

2019-05-27 02:23发布

问题:

I have a sqlite database (appr. 11 GB) that has multiple tables including the tables distance and vertices. The table distance is pretty large (120 mio rows), vertices is smaller (15 000 rows). I want to use sqlite3 in python to update one column of distance by values of another column in vertices. The table vertices has an index on column cat and another index on orig_cat.

What I am doing:

import sqlite3
db_path='path/to/db.db'

conn = sqlite3.connect(db_path)
cur = conn.cursor()

cur.execute('''UPDATE distance SET 
                from_orig_v = (SELECT orig_cat FROM vertices WHERE cat=distance.source)''')

However running that update statement on such a large database, causes a memory error. The memory usage is increasing steadily until it crashes. I am looking for advise to perform such a large update statement without running out of memory? Maybe processing the update in chunks (i.e. rows of distance table) and committing after e.g. 1000 updates to free memory? How would that be done in python/sqlite?

回答1:

It should be possible to update chunks with statements like this:

UPDATE distance SET ... WHERE rowid BETWEEN 100000 AND 200000;

You don't need to use multiple transactions; the only thing that actually must be kept in memory is the list of rows to be updated in a single statement. (In theory, when memory runs out, you should get an appropriate error message. In practice, some OSes overcommit memory and don't tell the application about it until it's too late.)