I have started to learn Python and SQL recently and have a question.
Using Python with SQLite3 I have written the following code:
# Use sqlite3 in the file
import sqlite3
# Create people.db if it doesn't exist or connect to it if it does exist
with sqlite3.connect("people.db") as connection:
c = connection.cursor()
# Create new table called people
c.execute("""CREATE TABLE IF NOT EXISTS people(firstname TEXT, lastname TEXT, age INT, occupation TEXT)""")
people_list = [
('Simon', 'Doe', 20, 'Python Master'),
('John', 'Doe', 50, 'Java Master'),
('Jane', 'Doe', 30, 'C++ Master'),
('Smelly', 'Doe', 2, 'Shower Master')
]
# Insert dummy data into the table
c.executemany("""INSERT INTO people(firstname, lastname, age, occupation) VALUES(?, ?, ?, ?)""", people_list)
I noticed that I can do the same thing using a for loop instead of executemany like so:
# Use sqlite3 in the file
import sqlite3
# Create people.db if it doesn't exist or connect to it if it does exist
with sqlite3.connect("people.db") as connection:
c = connection.cursor()
# Create new table called people
c.execute("""CREATE TABLE IF NOT EXISTS people(firstname TEXT, lastname TEXT, age INT, occupation TEXT)""")
people_list = [
('Simon', 'Doe', 20, 'Python Master'),
('John', 'Doe', 50, 'Java Master'),
('Jane', 'Doe', 30, 'C++ Master'),
('Smelly', 'Doe', 2, 'Shower Master')
]
# Insert dummy data into the table
for person in people_list:
c.execute("""INSERT INTO people(firstname, lastname, age, occupation) VALUES(?, ?, ?, ?)""", person)
I'm just wondering which one is more efficient and used more often?
A batch insert with
executemany
will be more efficient, and the performance difference will typically be quite large as your number of records gets large. There is a significant overhead to executing an insert statement, which you incur over and over again if you insert one row at a time.You should prefer a batch insert whenever it is straightforward to do so.
In some cases, it may be much simpler to code an algorithm that inserts one row at a time. For example, if you are receiving data from somewhere one item at a time, it may be simpler to just insert it as you get it, rather than building up a list and doing a single insert when you have saved up a lot of data. In such a case, you need to consider the tradeoffs between performance and simplicity of code. It's often better to start off with the simple approach (insert one row at a time) and then only optimize to something else if you find it's not performant.