OR definition of filters when using relations in d

2019-08-30 08:14发布

问题:

I have three models with a simple relation as below:

models.py

class Person(models.Model):
    first_name = models.CharField(max_length=20)
    last_name = models.CharField(max_length=20)

class PersonSession(models.Model):
    start_time = models.DateTimeField(auto_now_add=True)
    end_time = models.DateTimeField(null=True,
                                    blank=True)
    person = models.ForeignKey(Person, related_name='sessions')

class Billing(models.Model):
    DEBT = 'DE'
    BALANCED = 'BA'
    CREDIT = 'CR'

    session = models.OneToOneField(PersonSession,
                                   blank=False,
                                   null=False,
                                   related_name='billing')
    STATUS = ((BALANCED, 'Balanced'),
              (DEBT, 'Debt'),
              (CREDIT, 'Credit'))

    status = models.CharField(max_length=2,
                              choices=STATUS,
                              blank=False,
                              default=BALANCED
                              )

views.py

class PersonFilter(django_filters.FilterSet):
    start_time = django_filters.DateFromToRangeFilter(name='sessions__start_time',
                                 distinct=True)
    billing_status = django_filters.ChoiceFilter(name='sessions__billing__status',
                        choices=Billing.STATUS,
                        distinct=True)

    class Meta:
        model = Person
        fields = ('first_name', 'last_name')

class PersonList(generics.ListCreateAPIView):
    queryset = Person.objects.all()
    serializer_class = PersonSerializer
    filter_backends = (django_filters.rest_framework.DjangoFilterBackend)
    filter_class = PersonFilter

I want to get billings from person endpoint which have DE status in billing and are between a period of time:

api/persons?start_time_0=2018-03-20&start_time_1=2018-03-23&billing_status=DE

But the result is not what I were looking for, this returns all persons has a session in that period and has a billing with the DE status, whether that billing is on the period or not.

In other words, it seems use or operation between two filter fields, I think this post is related to this issue but currently I could not find a way to get the result I want. I am using djang 1.10.3.

Edit

I try to write an example to show what I need and what I get from django filter. If I get persons using below query in the example, I got just two person:

select * 
from 
test_filter_person join test_filter_personsession on test_filter_person.id=test_filter_personsession.person_id join test_filter_billing on test_filter_personsession.id=test_filter_billing.session_id 
where
start_time > '2000-02-01' and start_time < '2000-03-01' and status='DE';

Which gets me just person 1 and 2. But if I get somethings expected similar from url I would get all of persons, the similar url (at least one which I expected to be the same) is as below:

http://address/persons?start_time_0=2000-02-01&start_time_1=2000-03-01&billing_status=DE

Edit2

This is the data that my queries in the example are upon and using them you can see what must returns in queries that I mentioned above:

 id | first_name | last_name | id |        start_time         |         end_time          | person_id | id | status | session_id 
----+------------+-----------+----+---------------------------+---------------------------+-----------+----+--------+------------
  0 | person     | 0         |  0 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 |         0 |  0 | DE     |          0
  0 | person     | 0         |  1 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 |         0 |  1 | BA     |          1
  0 | person     | 0         |  2 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 |         0 |  2 | DE     |          2
  1 | person     | 1         |  3 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 |         1 |  3 | BA     |          3
  1 | person     | 1         |  4 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 |         1 |  4 | DE     |          4
  1 | person     | 1         |  5 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 |         1 |  5 | DE     |          5
  2 | person     | 2         |  6 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 |         2 |  6 | DE     |          6
  2 | person     | 2         |  7 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 |         2 |  7 | DE     |          7
  2 | person     | 2         |  8 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 |         2 |  8 | BA     |          8

Edit3

I try using prefetch_related to join tables and get results as I expected because I thought that extra join causes this problem but this did not work and I still get the same result and this had not any effects.

Edit4

This issue has the same problem.

回答1:

I don't have a solution yet; but I thought a concise summary of the problem will set more and better minds than mine at work!

From what I understand; your core issue is a result of two pre-conditions:

  1. The fact that you have two discrete filters defined on a related model; resulting in filter spanning-multi-valued-relationships
  2. The way FilterSet implements filtering

Let us look at these in more detail:

filter spanning-multi-valued-relationships

This is a great resource to understand issue pre-condition #1 better: https://docs.djangoproject.com/en/2.0/topics/db/queries/#spanning-multi-valued-relationships

Essentially, the start_time filter adds a .filter(sessions__start_time=value) to your Queryset, and the billing_status filter adds a .filter(sessions_billing_status=value) to the filter. This results in the "spanning-multi-valued-relationships" issue described above, meaning it will do an OR between these filters instead of an AND as you require it to.

This got me thinking, why don't we see the same issue in the start_time filter; but the trick here is that it is defined as a DateFromToRangeFilter; it internally uses a single filter query with the __range= construct. If instead it did sessions__start_time__gt= and sessions__start_time__lt=, we would have the same issue here.

The way FilterSet implements filtering

Talk is cheap; show me the code

@property
def qs(self):
    if not hasattr(self, '_qs'):
        if not self.is_bound:
            self._qs = self.queryset.all()
            return self._qs

        if not self.form.is_valid():
            if self.strict == STRICTNESS.RAISE_VALIDATION_ERROR:
                raise forms.ValidationError(self.form.errors)
            elif self.strict == STRICTNESS.RETURN_NO_RESULTS:
                self._qs = self.queryset.none()
                return self._qs
            # else STRICTNESS.IGNORE...  ignoring

        # start with all the results and filter from there
        qs = self.queryset.all()
        for name, filter_ in six.iteritems(self.filters):
            value = self.form.cleaned_data.get(name)

            if value is not None:  # valid & clean data
                qs = filter_.filter(qs, value)

        self._qs = qs

    return self._qs

As you can see, the qs property is resolved by iterating over a list of Filter objects, passing the initial qs through each of them successively and returning the result. See qs = filter_.filter(qs, value)

Each Filter object here defines a specific def filter operation, that basically takes teh Queryset and then adds a successive .filter to it.

Here's an example from the BaseFilter class

   def filter(self, qs, value):
        if isinstance(value, Lookup):
            lookup = six.text_type(value.lookup_type)
            value = value.value
        else:
            lookup = self.lookup_expr
        if value in EMPTY_VALUES:
            return qs
        if self.distinct:
            qs = qs.distinct()
        qs = self.get_method(qs)(**{'%s__%s' % (self.name, lookup): value})
        return qs

The line of code that matters is: qs = self.get_method(qs)(**{'%s__%s' % (self.name, lookup): value})

So the two pre-conditions create the perfect storm for this issue.