django queryset aggregation count counting wrong t

2019-07-12 14:17发布

问题:

This is a continuation question from:
Django queryset get distinct column values with respect to other column

My Problem:

Using aggregate count in Django counts the wrong thing in the queryset, or as far as I can see something that is not even in my queryset.

What I did I used:

queryset.order_by('col1', 'col2').distinct('col1', 'col2').values('col2')

to get the values of col2 of a model where all the rows have a distinct pair of values in (col1, col2). There is an example in the link above. I printed my queryset and it looks good, I have

[{'col2': value1}, ... , {'col2': value1},{'col2': value2}, ..., {'col2': value2},...]

I now want to count how much each value appears in the queryset I got. I do this using aggregation count. I have:

a = {'count': Count(F(col2), distinct=False)}
queryset.annotate(**a)

I tried this with ``distinct=True` as well but no luck

I would expect to get [{col2:value1, count: num1}, {col2:value2, count: num2} ...].
Instead I get [{col2: value1, count: num11}, {col2: value1, count: num12}, ... ,{col2: value1, count: num_1n}, {col2: value2, count: num21}, ... ,{col2: value1, count: num_2n}, ...]. Where as far as I can tell num11, ...., num_1n are the amount of lines value1 existed in col2 with any specific value in col1, previous to me using order_by('col1', 'col2').distinct('col1', 'col2').values('col2') when preparing the query.

I can't figure out what can cause this. I tried looking in the queryset.query parameter but I can't understand where I am going wrong.

Any help would be greatly appreciated.

回答1:

The .order_by should specify only 'col2', like:

queryset.values('col2').annotate(
    count=Count('col1', distinct=True)
).order_by('col2')

This will thus yield a QuerySet that looks like:

< QuerySet [
    {'col2': 1, 'count': 4 },
    {'col2': 2, 'count': 2 }
]>

So that means that there are four distinct values for col1 given col2 has value 1, and two distinct values for col1 given col2 has value 2.

This will construct a query like:

SELECT col2, COUNT(DISTINCT col1) AS count
FROM some_table
GROUP BY col2
ORDER BY col2

The .distinct(..) is here not necessary since due to the GROUP BY we will only obtain distinct col2 values, and because we COUNT(DISTINCT ..) this thus means that each distinct value of col1 is counted once.