I have a table that seems like this:
+-----+-----------+------------+
| id | value | date |
+-----+-----------+------------+
| id1 | 1499 | 2012-05-10 |
| id1 | 1509 | 2012-05-11 |
| id1 | 1511 | 2012-05-12 |
| id1 | 1515 | 2012-05-13 |
| id1 | 1522 | 2012-05-14 |
| id1 | 1525 | 2012-05-15 |
| id2 | 2222 | 2012-05-10 |
| id2 | 2223 | 2012-05-11 |
| id2 | 2238 | 2012-05-13 |
| id2 | 2330 | 2012-05-14 |
| id2 | 2340 | 2012-05-15 |
| id3 | 1001 | 2012-05-10 |
| id3 | 1020 | 2012-05-11 |
| id3 | 1089 | 2012-05-12 |
| id3 | 1107 | 2012-05-13 |
| id3 | 1234 | 2012-05-14 |
| id3 | 1556 | 2012-05-15 |
| ... | ... | ... |
| ... | ... | ... |
| ... | ... | ... |
+-----+-----------+------------+
What I want to do is to produce the total sum of the value
column for all the data
in this table per date. There is one entry for each id
per day. The problem is that
some ids haven't a value for all days, e.g. id2 haven't a value for the date: 2012-05-11
What I want to do is: when for a given date there is no value for a specific id, then the value of the previous date (much closer to the given date) to be calculated in the sum.
For example, suppose we have only the data shown above. we can take the sum of all values for a specific date from this query:
SELECT SUM(value) FROM mytable WHERE date='2012-05-12';
the result will be: 1511 + 1089 = 2600
But what I want to have is to make a query that does this calculation: 1511 + 2223 + 1089 = 4823
so that the 2223 of id2
of date 2012-05-11 is added instead of the missed value:
| id2 | 2223 | 2012-05-11 |
Do you know how can I do this through an SQL query? or through a script? e.g. python..
I have thousands of ids per date, so I would like the query to be a little bit fast if it is possible.
It's not pretty, as it has to join four copies of your table to itself, which could hit all sorts of performance pain (I strongly advise you to have indexes on
id
anddate
)... but this will do the trick:See it on sqlfiddle.
The SQL solution that I can think of for this is not very pretty (a sub-select inside a case statement on the value column with a right join to a dates sequence table... It's pretty ugly.) so I'll go with the python version:
Alternative row-by-row streaming loop:
Don't forget to close the connection when you're done!
You might want to think about the semantics of your
date
column a bit more.Perhaps you should add a column and make your
date
a range, instead.Anything you do that does not involve data from the record is likely to be slow. A literal interpretation of your request would potentially require a
date
traversal for each value to sum.