I am trying to figure out how to insert many (in the order of 100k) records into a db using sqlalchemy in python3. Everything points to using transactions, however I am slightly confused as to how that is done. Some pages state that you get a transaction from connection.begin(), others places say it is session.begin() and this page here says it is session.create_transaction() which doesn't exist.
Here is what I am trying to do:
def addToTable(listOfRows):
engine = create_engine('postgresql+pypostgresql:///%s' % db,echo = False)
Session = sessionmaker(bind = engine)
session = Session()
table = myTable(engine,session)
for row in listOfRows:
table.add(row)
table.flush() ### ideally there would be a counter and you flush after a couple of thousand records
class myTable:
def __init__(self,engine,session):
self.engine = engine
self.session = session
self.transaction =createTransaction()# Create transaction code here
def add(self,row):
newRow = tableRow(row) ## This just creates a representation of a row in the DB
self.transaction.add(newRow)
self.transaction.flush()
def flush(self):
self.transaction.commit()
I highly suggest that you do both tutorials before continuing on your trip with SQLAlchemy. They are really helpful and explain many concepts. Afterwards, I suggest you read Using the Session as this then goes on to explain how the session fits into all of this.
To your problem, there are two solutions: One using the ORM and the other using the Core. The former is easier, the latter is faster. Let's take the easy road first. A transaction is only used to wrap all your statements into a single operation. That is, if something fails, you can abort all of it and are not left with something somewhere in between. So you most likely want a transaction, but it would work without one. Here is the quickest way:
Depending on your data SQLAlchemy might even be able to optimize your
INSERT
statement in such a way that it executes multiple at a time. Here is what's going on:session.begin
add_all
, but a loop with multipleadd
would also be fine)So this is clearly a good way, but it is not the fastest way, because SQLAlchemy has to go through all the ORM algorithms which can produce somewhat of an overhead. If this is a one-time database initialization, you can avoid the ORM. In that case, instead of creating an ORM class (
tableRow
), you create a dictionary with all keys (how depends on the data). Again you can use a context manager:This would most likely be slightly faster but also less convenient. It works the same way as the session above only that it constructs the statement from the Core and not the ORM.