We have two columns Id
and month Id
.
The output what I'm looking for is to divide year from month Id based on quarter granularity. The activity column should be from quarter. If id is active activity should be 1 else 0 .If id comes in any of the 1st quarter (eg:only 1) the activity is still 1 .
Like this:
id month_dt
-----------------------------------
1000000000 2012-03-01 00:00:00.0
1000000000 2015-09-01 00:00:00.0
1000000000 2016-10-01 00:00:00.0
1000000000 2015-11-01 00:00:00.0
1000000000 2014-01-01 00:00:00.0
1000000000 2013-04-01 00:00:00.0
1000000000 2014-12-01 00:00:00.0
1000000000 2015-02-01 00:00:00.0
1000000000 2014-06-01 00:00:00.0
1000000000 2013-01-01 00:00:00.0
1000000000 2014-05-01 00:00:00.0
1000000000 2016-05-01 00:00:00.0
1000000000 2013-07-01 00:00:00.0
What is expected:
ID YEAR QTR ACTIVITY (1 or 0)
--------------------------------------------------
1000000000 2012 1 1
1000000000 2012 2 0
1000000000 2012 3 0
1000000000 2012 4 0
1000000000 2013 1 1
1000000000 2013 2 1
1000000000 2013 3 1
1000000000 2013 4 0
Below is the one I tried but it doesn't return the expected results. Please help me achieve this
SELECT
a.id, a.year,
SUM(CASE WHEN quarter BETWEEN 1 AND 3 THEN 1 ELSE 0 END) AS Q1,
SUM(CASE WHEN quarter BETWEEN 4 AND 6 THEN 1 ELSE 0 END) AS Q2,
SUM(CASE WHEN quarter BETWEEN 7 AND 9 THEN 1 ELSE 0 END) AS Q3,
SUM(CASE WHEN quarter BETWEEN 10 AND 12 THEN 1 ELSE 0 END) AS Q4
FROM
(SELECT
id,
TRIM(SUBSTRING(month_id, 1, 4)) AS year,
TRIM(regexp_replace(SUBSTR(month_id, 5, 4), "-", "")) as quarter
FROM
test.patientid) a
GROUP BY
a.id, a.year
I think you are looking for something like this:
This assumes that there is at least one activity for each quarter in a year (regardless of the quarter). Otherwise, you just need to put in a list of 1, 2, 3, and 4 to get the quarters.
@Babu; If the function quarter does not exist in your version of hive, I have an alternate function for getting the quarter on a give date. Hope this helps. Thanks!