I read this: Importing a CSV file into a sqlite3 database table using Python
and it seems that everyone suggests using line-by-line reading instead of using bulk .import from SQLite. However, that will make the insertion really slow if you have millions of rows of data. Is there any other way to circumvent this?
Update: I tried the following code to insert line by line but the speed is not as good as I expected. Is there anyway to improve it
for logFileName in allLogFilesName:
logFile = codecs.open(logFileName, 'rb', encoding='utf-8')
for logLine in logFile:
logLineAsList = logLine.split('\t')
output.execute('''INSERT INTO log VALUES(?, ?, ?, ?)''', logLineAsList)
logFile.close()
connection.commit()
connection.close()
Sqlite can do tens of thousands of inserts per second, just make sure to do all of them in a single transaction by surrounding the inserts with BEGIN and COMMIT. (executemany() does this automatically.)
As always, don't optimize before you know speed will be a problem. Test the easiest solution first, and only optimize if the speed is unacceptable.
Since this is the top result on a Google search I thought it might be nice to update this question.
From the python sqlite docs you can use
I have used this method to commit over 50k row inserts at a time and it's lightning fast.
Divide your data into chunks on the fly using generator expressions, make inserts inside the transaction. Here's a quote from sqlite optimization FAQ:
Here's how your code may look like.
Also, sqlite has an ability to import CSV files.