How do I memoize expensive calculations on Django

2019-03-24 18:22发布

问题:

I have several TextField columns on my UserProfile object which contain JSON objects. I've also defined a setter/getter property for each column which encapsulates the logic for serializing and deserializing the JSON into python datastructures.

The nature of this data ensures that it will be accessed many times by view and template logic within a single Request. To save on deserialization costs, I would like to memoize the python datastructures on read, invalidating on direct write to the property or save signal from the model object.

Where/How do I store the memo? I'm nervous about using instance variables, as I don't understand the magic behind how any particular UserProfile is instantiated by a query. Is __init__ safe to use, or do I need to check the existence of the memo attribute via hasattr() at each read?

Here's an example of my current implementation:

class UserProfile(Model):
    text_json = models.TextField(default=text_defaults)

    @property
    def text(self):
        if not hasattr(self, "text_memo"):
            self.text_memo = None
        self.text_memo = self.text_memo or simplejson.loads(self.text_json)
        return self.text_memo
    @text.setter
    def text(self, value=None):
        self.text_memo = None
        self.text_json = simplejson.dumps(value)

回答1:

You may be interested in a built-in django decorator django.utils.functional.memoize.

Django uses this to cache expensive operation like url resolving.



回答2:

Generally, I use a pattern like this:

def get_expensive_operation(self):
    if not hasattr(self, '_expensive_operation'):
        self._expensive_operation = self.expensive_operation()
    return self._expensive_operation

Then you use the get_expensive_operation method to access the data.

However, in your particular case, I think you are approaching this in slightly the wrong way. You need to do the deserialization when the model is first loaded from the database, and serialize on save only. Then you can simply access the attributes as a standard Python dictionary each time. You can do this by defining a custom JSONField type, subclassing models.TextField, which overrides to_python and get_db_prep_save.

In fact someone's already done it: see here.



回答3:

For class methods, you should use django.utils.functional.cached_property.

Since the first argument on a class method is self, memoize will maintain a reference to the object and the results of the function even after you've thrown it away. This can cause memory leaks by preventing the garbage collector from cleaning up the stale object. cached_property turns Daniel's suggestion into a decorator.