I have some data in a sql database and I'd like to calculate the slope. The data has this layout:
Date | Keyword | Score
2012-01-10 | ipad | 0.12
2012-01-11 | ipad | 0.17
2012-01-12 | ipad | 0.24
2012-01-10 | taco | 0.19
2012-01-11 | taco | 0.34
2012-01-12 | taco | 0.45
I'd like the final output to look like this by creating a new table using SQL:
Date | Keyword | Score | Slope
2012-01-10 | ipad | 0.12 | 0.06
2012-01-11 | ipad | 0.17 | 0.06
2012-01-12 | ipad | 0.24 | 0.06
2012-01-10 | taco | 0.19 | 0.13
2012-01-11 | taco | 0.34 | 0.13
2012-01-12 | taco | 0.45 | 0.13
To complicate things, not all Keywords have 3 dates worth of data, some have only 2 for instance.
The simpler the SQL the better since my database is proprietary and I'm not quite sure what formulas are available, although I know it can do OVER(PARTITION BY) if that helps. Thank you!
UPDATE: I define the slope as best fit y=mx+p aka in excel it would be =slope()
Here is another actual example that I usually manipulate in excel:
date keyword score slope
1/22/2012 water bottle 0.010885442 0.000334784
1/23/2012 water bottle 0.011203949 0.000334784
1/24/2012 water bottle 0.008460835 0.000334784
1/25/2012 water bottle 0.010363991 0.000334784
1/26/2012 water bottle 0.011800716 0.000334784
1/27/2012 water bottle 0.012948411 0.000334784
1/28/2012 water bottle 0.012732459 0.000334784
1/29/2012 water bottle 0.011682568 0.000334784
The cleanest one I could make:
It uses Simple Linear Regression to calculate the slope.
Result:
Every database system seems to have a different approach to converting dates to numbers:
TO_SECONDS(date)
orTO_DAYS(date)
TO_NUMBER(TO_CHAR(date, 'J'))
ordate - TO_DATE('1','yyyy')
CAST(date AS float)
(or equivalentCONVERT
)If you're defining slope as just the slope from the earliest point to the latest point, and if score only increases with date, then you can get the output above with this:
However if you want linear regression, or if scores can decrease as well as increase with time, you'll need something more complex.
Cast to decimal does not give correct results for me, it is not linear to the dates. Use
TO_DAYS(date_field)
instead, this becomes correct.