Missing 'Median' Aggregate Function in Dja

2020-07-06 02:48发布

The Development version of Django has aggregate functions like Avg, Count, Max, Min, StdDev, Sum, and Variance (link text). Is there a reason Median is missing from the list?

Implementing one seems like it would be easy. Am I missing something? How much are the aggregate functions doing behind the scenes?

6条回答
女痞
2楼-- · 2020-07-06 03:21

Here's your missing function. Pass it a queryset and the name of the column that you want to find the median for:

def median_value(queryset, term):
    count = queryset.count()
    return queryset.values_list(term, flat=True).order_by(term)[int(round(count/2))]

That wasn't as hard as some of the other responses seem to indicate. The important thing is to let the db sorting do all of the work, so if you have the column already indexed, this is a super cheap operation.

(update 1/28/2016) If you want to be more strict about the definition of median for an even number of items, this will average together the value of the two middle values.

def median_value(queryset, term):
    count = queryset.count()
    values = queryset.values_list(term, flat=True).order_by(term)
    if count % 2 == 1:
        return values[int(round(count/2))]
    else:
        return sum(values[count/2-1:count/2+1])/Decimal(2.0)
查看更多
祖国的老花朵
3楼-- · 2020-07-06 03:21

A strong possibility is that median is not part of standard SQL.

Also, it requires a sort, making it quite expensive to compute.

查看更多
放荡不羁爱自由
4楼-- · 2020-07-06 03:32

I have no idea what db backend you are using, but if your db supports another aggregate, or you can find a clever way of doing it, You can probably access it easily by Aggregate.

查看更多
在下西门庆
5楼-- · 2020-07-06 03:38

FWIW, you can extend PostgreSQL 8.4 and above to have a median aggregate function with these code snippets.

Other code snippets (which work for older versions of PostgreSQL) are shown here. Be sure to read the comments for this resource.

查看更多
Bombasti
6楼-- · 2020-07-06 03:42

Well, the reason is probably that you need to track all the numbers to calculate median. Avg, Count, Max, Min, StDev, Sum, and Variance can all be calculated with constant storage needs. That is, once you "record" a number you'll never need it again.

FWIW, the variables you need to track are: min, max, count, <n> = avg, <n^2> = avg of the square of the values.

查看更多
Rolldiameter
7楼-- · 2020-07-06 03:46

Because median isn't a SQL aggregate. See, for example, the list of PostgreSQL aggregate functions and the list of MySQL aggregate functions.

查看更多
登录 后发表回答