I have two tables, say A and B. Both have a primary key id. They have a many-to-many relationship, SEC.
SEC = Table('sec', Base.metadata,
Column('a_id', Integer, ForeignKey('A.id'), primary_key=True, nullable=False),
Column('b_id', Integer, ForeignKey('B.id'), primary_key=True, nullable=False)
)
class A():
...
id = Column(Integer, primary_key=True)
...
rels = relationship(B, secondary=SEC)
class B():
...
id = Column(Integer, primary_key=True)
...
Let's consider this piece of code.
a = A()
b1 = B()
b2 = B()
a.rels = [b1, b2]
...
#some place later
b3 = B()
a.rels = [b1, b3] # errors sometimes
Sometimes, I get an error at the last line saying
duplicate key value violates unique constraint a_b_pkey
In my understanding, I think it tries to add (a.id, b.id) into 'sec' table again resulting in a unique constraint error. Is that what it is? If so, how can I avoid this? If not, why do I have this error?
The error you mention is indeed from inserting a conflicting value to the sec table. To be sure that it is from the operation you think it is, not some previous change, turn on SQL logging and check what values is it trying to insert before erroring out.
When overwriting a many-to-many collection value, SQLAlchemy compares the new contents of the collection with the state in the database and correspondingly issues delete and insert statements. Unless you are poking around in SQLAlchemy internals, there should be two ways to encounter this error.
First is concurrent modification: Process 1 fetches the value a.rels and notices that it is empty, meanwhile Process 2 also fetches a.rels, sets it to [b1, b2] and commits flushing the (a,b1),(a,b2) tuples, Process 1 sets a.rels to [b1, b3] noticing that the previous contents was empty and when it tries to flush the sec tuple (a,b1) it gets a duplicate key error. The correct action in such cases is usually to retry the transaction from the top. You can use serializable transaction isolation to instead get a serialization error in this case that is distinct from a business logic error causing a duplicate key error.
The second case happens when you have managed to convince SQLAlchemy that you don't need to know the database state by setting the loading strategy of the rels attribute to
noload
. This can be done when defining the relationship by adding thelazy='noload'
parameter, or when querying, calling.options(noload(A.rels))
on the query. SQLAlchemy will assume that sec table has no matching rows for objects loaded with this strategy in effect.The problem is you want to make sure the instances you create are unique. We can create an alternate constructor that checks a cache of existing uncommited instances or queries the database for existing commited instance before returning a new instance.
Here is a demonstration of such a method:
The
create_unique
method was inspired by the example from the SQLAlchemy wiki. This version is much less convoluted, favoring simplicity over flexibility. I have used it in production systems with no problems.There are obviously improvements that can be added; this is just a simple example. The
get_unique
method could be inherited from aUniqueMixin
, to be used for any number of models. More flexible memoizing of arguments could be implemented. This also puts aside the problem of multiple threads inserting conflicting data mentioned by Ants Aasma; handling that is more complex but should be an obvious extension. I leave that to you.