I have two Django models as shown below, MyModel1
& MyModel2
:
class MyModel1(CachingMixin, MPTTModel):
name = models.CharField(null=False, blank=False, max_length=255)
objects = CachingManager()
def __str__(self):
return "; ".join(["ID: %s" % self.pk, "name: %s" % self.name, ] )
class MyModel2(CachingMixin, models.Model):
name = models.CharField(null=False, blank=False, max_length=255)
model1 = models.ManyToManyField(MyModel1, related_name="MyModel2_MyModel1")
objects = CachingManager()
def __str__(self):
return "; ".join(["ID: %s" % self.pk, "name: %s" % self.name, ] )
MyModel2
has a ManyToMany field to MyModel1
entitled model1
Now look what happens when I add a new entry to this ManyToMany field. According to Django, it has no effect:
>>> m1 = MyModel1.objects.all()[0]
>>> m2 = MyModel2.objects.all()[0]
>>> m2.model1.all()
[]
>>> m2.model1.add(m1)
>>> m2.model1.all()
[]
Why? It seems definitely like a caching issue because I see that there is a new entry in Database table myapp_mymodel2_mymodel1 for this link between m2
& m1
. How should I fix it??
Is django-cache-machine really needed?
MyModel1.objects.all()[0]
Roughly translates to
SELECT * FROM app_mymodel LIMIT 1
Queries like this are always fast. There would not be a significant difference in speeds whether you fetch it from the cache or from the database.
When you use cache manager you actually add a bit of overhead here that might make things a bit slower. Most of the time this effort will be wasted because there may not be a cache hit as explained in the next section.
How django-cache-machine works
Whenever you run a query, CachingQuerySet
will try to find that query
in the cache. Queries are keyed by {prefix}:{sql}
. If it’s there, we
return the cached result set and everyone is happy. If the query isn’t
in the cache, the normal codepath to run a database query is executed.
As the objects in the result set are iterated over, they are added to
a list that will get cached once iteration is done.
source: https://cache-machine.readthedocs.io/en/latest/
Accordingly, with the two queries executed in your question being identical, cache manager will fetch the second result set from memcache provided the cache hasn't been invalided.
The same link explains how cache keys are invalidated.
To support easy cache invalidation, we use “flush lists” to mark the
cached queries an object belongs to. That way, all queries where an
object was found will be invalidated when that object changes. Flush
lists map an object key to a list of query keys.
When an object is saved or deleted, all query keys in its flush list
will be deleted. In addition, the flush lists of its foreign key
relations will be cleared. To avoid stale foreign key relations, any
cached objects will be flushed when the object their foreign key
points to is invalidated.
It's clear that saving or deleting an object would result in many objects in the cache having to be invalidated. So you are slowing down these operations by using cache manager. Also worth noting is that the invalidation documentation does not mention many to many fields at all. There is an open issue for this and from your comment on that issue it's clear that you have discovered it too.
Solution
Chuck cache machine. Caching all queries are almost never worth it. It leads to all kind of hard to find bugs and issues. The best approach is to optimize your tables and fine tune your queries. If you find a particular query that is too slow cache it manually.
This was my workaround solution:
>>> m1 = MyModel1.objects.all()[0]
>>> m1
<MyModel1: ID: 8887972990743179; name: my-name-blahblah>
>>> m2 = MyModel2.objects.all()[0]
>>> m2.model1.all()
[]
>>> m2.model1.add(m1)
>>> m2.model1.all()
[]
>>> MyModel1.objects.invalidate(m1)
>>> MyModel2.objects.invalidate(m2)
>>> m2.save()
>>> m2.model1.all()
[<MyModel1: ID: 8887972990743179; name: my-name-blahblah>]
Have you considered hooking into the model signals to invalidate the cache when an object is added? For your case you should look at M2M Changed Signal
Small example that doesn't solve your problem but relates the workaround you gave before to my signals solution approach (I don't know django-cache-machine):
def invalidate_m2m(sender, **kwargs):
instance = kwargs.get('instance', None)
action = kwargs.get('action', None)
if action == 'post_add':
Sender.objects.invalidate(instance)
m2m_changed.connect(invalidate_m2m, sender=MyModel2.model1.through)