I've got 32 SQLite (3.7.9) databases with 3 tables each that I'm trying to merge together using the idiom that I've found elsewhere (each db has the same schema):
attach db1.sqlite3 as toMerge;
insert into tbl1 select * from toMerge.tbl1;
insert into tbl2 select * from toMerge.tbl2;
insert into tbl3 select * from toMerge.tbl3;
detach toMerge;
and rinse-repeating for the entire set of databases. I do this using python and the sqlite3 module:
for fn in filelist:
completedb = sqlite3.connect("complete.sqlite3")
c = completedb.cursor()
c.execute("pragma synchronous = off;")
c.execute("pragma journal_mode=off;")
print("Attempting to merge " + fn + ".")
query = "attach '" + fn + "' as toMerge;"
c.execute(query)
try:
c.execute("insert into tbl1 select * from toMerge.tbl1;")
c.execute("insert into tbl2 select * from toMerge.tbl2;")
c.execute("insert into tbl3 select * from toMerge.tbl3;")
c.execute("detach toMerge;")
completedb.commit()
except sqlite3.Error as err:
print "Error! ", type(err), " Error msg: ", err
raise
2 of the tables are fairly small, only 50K rows per db, while the third (tbl3) is larger, about 850 - 900K rows. Now, what happens is that the inserts progressively slow down until I get to about the fourth database when they grind to a near halt (on the order of a a megabyte or two in file size added every 1-3 minutes to the combined database). In case it was python, I've even tried dumping out the tables as INSERTs (.insert; .out foo; sqlite3 complete.db < foo is the skeleton, found here) and combining them in a bash script using the sqlite3 CLI to do the work directly, but I get exactly the same problem.
The table setup of tbl3 isn't too demanding - a text field containing a UUID, two integers, and four real values. My worry is that it's the number of rows, because I ran into exactly the same trouble at exactly the same spot (about four databases in) when the individual databases were an order of magnitude larger in terms of file size with the same number of rows (I trimmed the contents of tbl3 significantly by storing summary stats instead of raw data). Or maybe it's the way I'm performing the operation? Can anyone shed some light on this problem that I'm having before I throw something out the window?