-->

How Do I aggregate Data By Day and Still Respect T

2020-06-10 06:07发布

问题:

We are currently using a summary table that aggregates information for our users on an hourly basis in UTC time. The problem we are having is that this table is becoming too large and slowing our system down immensely. We have done all the tuning techniques recommended for PostgreSQL and we are still experiencing slowness.

Our idea was to start aggregating by day rather than by hour, but the problem is that we allow our customers to change the timezone, which recalculates the data for that day.

Does anyone know of a way to store the daily summary but still respect the numbers and totals when they switch timezones?

回答1:

Summarise the data in tables with a timeoffset column, and a "day" field (a date) that is the day for that particular summary line. Index on (timeoffset, day, other relevant fields), clustered if possible (presumably PostgresSQL has clustered indexes?) and all should be well.



回答2:

I'm assuming you've went through all the partitioning considerations, such as partitioning by user.

I can see several solutions to your problem, depending on the usage pattern.

  1. Aggregate data per day, per user selection. In the event of timezone change, programatically recalculate the aggregate for this partner. This is plausible if timezone changes are infrequent and if a certain delay in data may be introduced when a user changes timezones.

  2. If you have relatively few measures, you may maintain 24 columns for each measure - each describing the daily aggregate for the measure in a different timezone.

  3. If timezone changes are frequent and there are numerous measures, it seems like 24 different aggregate tables would be the way to go.



回答3:

I met this problem too. I take this solution: the data with date type use local timezone, the other data with datetime type use UTC timezone, because the statistics index is local. Another reason is now we have only local data.