-->

django aggregation to lower resolution using group

2019-02-17 12:15发布

问题:

horrible title, but let me explain: i've got this django model containing a timestamp (date) and the attribute to log - f.e. the number of users consuming some ressource - (value).

class Viewers(models.Model):
    date = models.DateTimeField()
    value = models.IntegerField()

for each 10seconds the table contains the number of users. something like this:

| date | value |
|------|-------|
|  t1  |   15  |
|  t2  |   18  |
|  t3  |   27  |
|  t4  |   25  |
|  ..  |   ..  |
|  t30 |   38  |
|  t31 |   36  |
|  ..  |   ..  |

now i want to generate different statistics from this data, each with another resolution. f.e. for a chart of the last day i don't need the 10second resolution, so i want 5 minute steps (that are build by averaging the values (and maybe also the date) of the rows from t1 to t29, t30 to t59, ...), so that i'll get:

| date | value |
|------|-------|
|  t15 |   21  |
|  t45 |   32  |
|  ..  |   ..  |

the attributes to keep variable are start & end timestamp and the resolution (like 5 minutes). is there a way using the django orm/queryset api and if not, how to reach this with custom sql?

回答1:

I've been trying to solve this problem in the most 'django' way possible. I've settled for the following. It averages the values for 15minute time slots between start_date and end_date where the column name is'date':

readings = Reading.objects.filter(date__range=(start_date, end_date)) \
   .extra(select={'date_slice': "FLOOR (EXTRACT (EPOCH FROM date) / '900' )"}) \
   .values('date_slice') \
   .annotate(value_avg=Avg('value'))

It returns a dictionary:

 {'value_avg': 1116.4925373134329, 'date_slice': 1546512.0}
 {'value_avg': 1001.2028985507246, 'date_slice': 1546513.0}
 {'value_avg': 1180.6285714285714, 'date_slice': 1546514.0}

The core of the idea comes from this answer to the same question for PHP/SQL. The code passed to extra is for a Postgres DB.



回答2:

from django.db.models import Avg

Viewers.objects.filter(date__range=(start_time, end_time)).aggregate(average=Avg('value'))

That will get you the average of all the values between start_time and end_time, returned as a dictionary in the form of { 'average': <the average> }.

start_time and end_time need to be Python datetime objects. So if you have a timestamp, or something, you'll need to convert it first. You can also use datetime.timedelta to calculate the end_time based on the start_time. For a five minute resolution, something like this:

from datetime import timedelta

end_time = start_time + timedelta(minutes=5)


回答3:

have you looked at the range filter?

https://docs.djangoproject.com/en/dev/ref/models/querysets/#range

The example given in the doc's seems similar to your situation.



回答4:

After long trying i made it as SQL-statement:

SELECT FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date))), SUM(value)
FROM `my_table`
WHERE date BETWEEN SUBTIME(NOW( ), '0:30:00') AND NOW()
GROUP BY UNIX_TIMESTAMP(date) DIV 300
ORDER BY date DESC

with

start_time = SUBTIME(NOW( ), '0:30:00')
end_time = NOW()
period = 300 # in seconds

in the end - not really hard - and indeed independent from the time resolution of the samplings in the origin table.