I have data on monthly sales like this
Company Month Sales
Adidas 2018-09 100
Adidas 2018-08 95
Adidas 2018-07 120
Adidas 2018-06 155
...and so on
I need to add another column stating the median over the past 12 months
(or as many as there is data for if 12 months are not available).
In Python I figured out how to do it with for
loops, but I'm not sure how to do in BigQuery.
Thank you!
Here is an approach that might work:
CREATE TEMP FUNCTION MEDIAN(arr ANY TYPE) AS ((
SELECT
IF(
MOD(ARRAY_LENGTH(arr), 2) = 0,
(arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2) - 1)] + arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]) / 2,
arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]
)
FROM (SELECT ARRAY_AGG(x ORDER BY x) AS arr FROM UNNEST(arr) AS x)
));
SELECT
Company,
Month,
MEDIAN(
ARRAY_AGG(Sales) OVER (PARTITION BY Company ORDER BY Month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW)
) AS trailing_median
FROM (
SELECT 'Adidas' AS Company, '2018-09' AS Month, 100 AS Sales UNION ALL
SELECT 'Adidas', '2018-08', 95 UNION ALL
SELECT 'Adidas', '2018-07', 120 UNION ALL
SELECT 'Adidas', '2018-06', 155
);
The results are:
+---------+---------+-----------------+
| Company | Month | trailing_median |
+---------+---------+-----------------+
| Adidas | 2018-06 | 155.0 |
| Adidas | 2018-07 | 137.5 |
| Adidas | 2018-08 | 120.0 |
| Adidas | 2018-09 | 110.0 |
+---------+---------+-----------------+