Django “emulate” database trigger behavior on bulk

2019-08-03 16:17发布

问题:

It's a self expaining question but here we go. I'm creating a business app in Django, and i didn't wanted to "spread" all the logic across app AND database, but in the other hand, i didn't wanted to let the Database handle this task (its possible through the use of Triggers).

So I wanted to "reproduce" the behavior of the Databse Triggers, but inside the Model Class in Django (um currently using Django 1.4).

After some research, I figured out that with single objects, I could override the "save" and "delete" methods of "models.Model" class, inserting the "before" and "after" hooks so they could be executed before and after the parent's save/delete. Like This:

     class MyModel(models.Model):

         def __before(self):
             pass

         def __after(self):
            pass

         @commit_on_success #the decorator is only to ensure that everything occurs inside the same transaction
         def save(self, *args, *kwargs):
             self.__before()
             super(MyModel,self).save(args, kwargs)
             self.__after()

The BIG problem is with bulk operations. Django doesn't triggers the save/delete of the models when running the "update()"/"delete()" from it's QuerySet. Insted, it uses the QuerySet's own method. And to get a little bit worst, it doesn't trigger any signal either.

Edit: Just to be a little more specific: the model loading inside the view is dynamic, so it's impossible to define a "model specific" way. In this case, I should create an Abstract Class and handle it there.

My last attempt was to create a custom Manager, and in this custom manager, override the update method, looping over the models inside the queryset, and trigering the "save()" of each model (take in consideration the implementation above, or the "signals" system). It works, but results in a database "overload" (imagine a 10k rows queryset being updated).

回答1:

With a few caveats, you can override the queryset's update method to fire the signals, while still using an SQL UPDATE statement:

from django.db.models.signals import pre_save, post_save

def CustomQuerySet(QuerySet):
    @commit_on_success
    def update(self, **kwargs):
        for instance in self:
            pre_save.send(sender=instance.__class__, instance=instance, raw=False, 
                          using=self.db, update_fields=kwargs.keys())
        # use self instead of self.all() if you want to reload all data 
        # from the db for the post_save signal
        result = super(CustomQuerySet, self.all()).update(**kwargs)
        for instance in self:
            post_save.send(sender=instance.__class__, instance=instance, created=False,
                           raw=False, using=self.db, update_fields=kwargs.keys())
        return result

    update.alters_data = True

I clone the current queryset (using self.all()), because the update method will clear the cache of the queryset object.

There are a few issues that may or may not break your code. First of all it will introduce a race condition. You do something in the pre_save signal's receivers based on data that may no longer be accurate when you update the database.

There may also be some serious performance issues with large querysets. Unlike the update method, all models will have to be loaded into memory, and then the signals still need to be executed. Especially if the signals themselves have to interact with the database, performance can be unacceptably slow. And unlike the regular pre_save signal, changing the model instance will not automatically cause the database to be updated, as the model instance is not used to save the new data.

There are probably some more issues that will cause a problem in a few edge cases.

Anyway, if you can handle these issues without having some serious problems, I think this is the best way to do this. It produces as little overhead as possible while still loading the models into memory, which is pretty much required to correctly execute the various signals.



回答2:

First, instead of overriding save to add __before and __after methods, you can use the built-in pre_save, post_save, pre_delete, and post_delete signals. https://docs.djangoproject.com/en/1.4/topics/signals/

from django.db.models.signals import post_save

class YourModel(models.Model):
    pass

def after_save_your_model(sender, instance, **kwargs):
     pass

# register the signal
post_save.connect(after_save_your_model, sender=YourModel, dispatch_uid=__file__)

pre_delete and post_delete will get triggered when you call delete() on a queryset.

For bulk updating, you'll have to manually call the function you want to trigger yourself, however. And you can throw it all in a transaction as well.

To call the proper trigger function if you're using dynamic models, you can inspect the model's ContentType. For example:

from django.contrib.contenttypes.models import ContentType

def view(request, app, model_name, method):
    ...
    model = get_model(app, model_name)
    content_type = ContentType.objects.get_for_model(model)
    if content_type == ContenType.objects.get_for_model(YourModel):
        after_save_your_model(model)
    elif content_type == Contentype.objects.get_for_model(AnotherModel):
        another_trigger_function(model)