I'm trying to implement (what I think is) a pretty simple data model for a counter:
class VisitorDayTypeCounter(models.Model):
visitType = models.CharField(max_length=60)
visitDate = models.DateField('Visit Date')
counter = models.IntegerField()
When someone comes through, it will look for a row that matches the visitType and visitDate; if this row doesn't exist, it will be created with counter=0.
Then we increment the counter and save.
My concern is that this process is totally a race. Two requests could simultaneously check to see if the entity is there, and both of them could create it. Between reading the counter and saving the result, another request could come through and increment it (resulting in a lost count).
So far I haven't really found a good way around this, either in the Django documentation or in the tutorial (in fact, it looks like the tutorial has a race condition in the Vote part of it).
How do I do this safely?
Why not use the database as the concurrency layer ? Add a primary key or unique constraint the table to visitType and visitDate. If I'm not mistaken, django does not exactly support this in their database Model class or at least I've not seen an example.
Once you've added the constraint/key to the table, then all you have to do is:
It's nasty to do it this way, but it seems quick enough and would cover most situations.
Two suggestions:
Add a unique_together to your model, and wrap the creation in an exception handler to catch duplicates:
After this, you could stlll have a minor race condition on the counter's update. If you get enough traffic to be concerned about that, I would suggest looking into transactions for finer grained database control. I don't think the ORM has direct support for locking/synchronization. The transaction documentation is available here.
As of Django 1.1 you can use the ORM's F() expressions.
For more details see the documentation:
https://docs.djangoproject.com/en/1.8/ref/models/instances/#updating-attributes-based-on-existing-fields
https://docs.djangoproject.com/en/1.8/ref/models/expressions/#django.db.models.F
Your should use database transactions to avoid this kind of race condition. A transaction lets you perform the whole operation of creating, reading, incrementing and saving the counter on an "all or nothing" base. If anything goes wrong it will roll back the whole thing and you can try again.
Check out the Django docs. There is a transaction middle ware, or you can use decorators around views or methods to create transactions.
You can use patch from http://code.djangoproject.com/ticket/2705 for support database level locking.
With patch this code will be atomic:
If you truly want the counter to be accurate you could use a transaction but the amount of concurrency required will really drag your application and database down under any significant load. Instead think of going with a more messaging style approach and just keep dumping count records into a table for each visit where you'd want to increment the counter. Then when you want the total number of visits do a count on the visits table. You could also have a background process that ran any number of times a day that would sum the visits and then store that in the parent table. To save on space it would also delete any records from the child visits table that it summed up. You'll cut down on your concurrency costs a huge amount if you don't have multiple agents vying for the same resources (the counter).