Divide Ids based on quarter and the count either 1

2019-08-18 19:06发布

问题:

We have two columns Id and month Id.

The output what I'm looking for is to divide year from month Id based on quarter granularity. The activity column should be from quarter. If id is active activity should be 1 else 0 .If id comes in any of the 1st quarter (eg:only 1) the activity is still 1 .

Like this:

id           month_dt
-----------------------------------
1000000000   2012-03-01 00:00:00.0
1000000000   2015-09-01 00:00:00.0
1000000000   2016-10-01 00:00:00.0
1000000000   2015-11-01 00:00:00.0
1000000000   2014-01-01 00:00:00.0
1000000000   2013-04-01 00:00:00.0
1000000000   2014-12-01 00:00:00.0
1000000000   2015-02-01 00:00:00.0
1000000000   2014-06-01 00:00:00.0
1000000000   2013-01-01 00:00:00.0
1000000000   2014-05-01 00:00:00.0
1000000000   2016-05-01 00:00:00.0
1000000000   2013-07-01 00:00:00.0

What is expected:

ID           YEAR     QTR      ACTIVITY (1 or 0)
--------------------------------------------------
1000000000   2012      1          1
1000000000   2012      2          0
1000000000   2012      3          0
1000000000   2012      4          0
1000000000   2013      1          1
1000000000   2013      2          1
1000000000   2013      3          1
1000000000   2013      4          0

Below is the one I tried but it doesn't return the expected results. Please help me achieve this

SELECT
    a.id, a.year,
    SUM(CASE WHEN quarter BETWEEN 1 AND 3 THEN 1 ELSE 0 END) AS Q1,
    SUM(CASE WHEN quarter BETWEEN 4 AND 6 THEN 1 ELSE 0 END) AS Q2,
    SUM(CASE WHEN quarter BETWEEN 7 AND 9 THEN 1 ELSE 0 END) AS Q3,
    SUM(CASE WHEN quarter BETWEEN 10 AND 12 THEN 1 ELSE 0 END) AS Q4
FROM
    (SELECT
         id, 
         TRIM(SUBSTRING(month_id, 1, 4)) AS year,
         TRIM(regexp_replace(SUBSTR(month_id, 5, 4), "-", "")) as quarter    
     FROM
         test.patientid) a
GROUP BY 
    a.id, a.year

回答1:

I think you are looking for something like this:

select y.yyyy, q.q,
       (case when count(t.month_dt) > 0 then 1 else 0 end) as activity_flag
from (select distinct year(month_dt) as yyyy from t) y cross join
     (select distinct quarter(month_dt) as q from t) q left join
     t
     on year(t.month_dt) = y.yyyy and quarter(t.month_dt) = q.q
group by y.yyyy, q.q;

This assumes that there is at least one activity for each quarter in a year (regardless of the quarter). Otherwise, you just need to put in a list of 1, 2, 3, and 4 to get the quarters.



回答2:

@Babu; If the function quarter does not exist in your version of hive, I have an alternate function for getting the quarter on a give date. Hope this helps. Thanks!

create table qtrs(qtr int);
insert into qtrs  values (1),(2),(3),(4);

create table ims 
(id int,
month_dt date
);

insert into ims values
(100, '2012-03-01'), 
(100, '2013-04-01'), 
(100, '2013-01-01'), 
(100, '2013-07-01'), 
(100, '2014-01-01'), 
(100, '2014-05-01'), 
(100, '2014-06-01'), 
(100, '2014-12-01'), 
(100, '2015-02-01'), 
(100, '2015-09-01'), 
(100, '2015-11-01'), 
(100, '2016-05-01'), 
(100, '2016-10-01'); 

insert into ims values
(200, '2012-03-01'), 
(200, '2013-04-01'); 

Query:
select DISTINCT NVL(ims.id, qtr.id) as id,qtr.year as year,qtr.qtr as qtr,
IF(ims.id is null, 0, 1) as activity  
from jbacoy.ims ims
right join (select distinct ims.id,YEAR(ims.month_dt) as year,qtrs.qtr from jbacoy.ims ims join jbacoy.qtrs qtrs) qtr 
on (ims.id=qtr.id and year(ims.month_dt)=qtr.year and int((month(month_dt)-1)/3)+1=qtr.qtr)
sort by id, year, qtr;

Result:
id      year    qtr     activity
100     2012    1       1
100     2012    2       0
100     2012    3       0
100     2012    4       0
100     2013    1       1
100     2013    2       1
100     2013    3       1
100     2013    4       0
100     2014    1       1
100     2014    2       1
100     2014    3       0
100     2014    4       1
100     2015    1       1
100     2015    2       0
100     2015    3       1
100     2015    4       1
100     2016    1       0
100     2016    2       1
100     2016    3       0
100     2016    4       1
200     2012    1       1
200     2012    2       0
200     2012    3       0
200     2012    4       0
200     2013    1       0
200     2013    2       1
200     2013    3       0
200     2013    4       0


标签: sql hive hiveql