query to divide data

2019-08-30 23:36发布

问题:

we have two columns id and monthid.

The output what I'm looking for is to divide year from month Id based on quarter . The output column should be from quarter. If id is active output should be 1 else 0 .If id comes in any of the 1st quarter (eg:only 1) the output is still 1 .

Like this:

id           month
-----------------------------------
100   2012-03-01 00:00:00.0
100   2015-09-01 00:00:00.0
100   2016-10-01 00:00:00.0
100   2015-11-01 00:00:00.0
100   2014-01-01 00:00:00.0
100   2013-04-01 00:00:00.0
100   2014-12-01 00:00:00.0
100   2015-02-01 00:00:00.0
100   2014-06-01 00:00:00.0
100   2013-01-01 00:00:00.0
100   2014-05-01 00:00:00.0
100   2016-05-01 00:00:00.0
100   2013-07-01 00:00:00.0

result should be something like

ID    YEAR     QTR      output (1 or 0)
--------------------------------------------------
100   2012      1          1
100   2012      2          0
100   2012      3          0
100   2012      4          0
100   2013      1          1
100   2013      2          1
100   2013      3          1
100   2013      4          0

Below is the one I tried but it doesn't return the expected results. Please help me achieve this.I want when the ouput is 0 as well.

select a.id,a.year,a.month,
CASE WHEN a.month BETWEEN 1 AND 4 THEN 1 
 ELSE 0 END as output
from
(select id,trim(substring(claim_month_id,1,4)) as year,(INT((MONTH(monthid)-1)/3)+1) as month from test) a
group by a.id,a.year,a.month

Any help would be appreciated.

回答1:

@Ani; there is no hierarchical query in Hive to create four quarters (1,2,3,4) so I create a small table for it. Then I get all patient_id, year and month that exists in ims_patient_activity_diagnosis table. Finally, I did a right join on all possible patient id, year and quarters (1,2,3,4); If the id or year or quarter does not exists in the right join, then there is no activity for that id, year and quarter. I assign activity=0 for those rows. I also inserted patient id=200 to test if there are more patient id in the table. Hope this helps. Thanks.

create table dbo.qtrs(month int);
insert into qtrs  values (1),(2),(3),(4);

select DISTINCT NVL(ims.id, qtr.id) as patient_id,
qtr.year as year,
qtr.month as month,
CASE WHEN ims.id > 0 THEN 1 ELSE 0 END as activity  
from sandbox_grwi.ims_patient_activity_diagnosis ims
right join (select distinct ims.id,YEAR(ims.month_dt) as year,qtrs.month from sandbox_grwi.ims_patient_activity_diagnosis ims join dbo.qtrs qtrs) qtr 
on (ims.id=qtr.id and YEAR(ims.month_dt)=qtr.year and INT((MONTH(month_dt)-1)/3)+1=qtr.month)
sort by patient_id, year, month;

Sample Result:
p_id    year    month   activity
100     2012    1       1
100     2012    2       0
100     2012    3       0
100     2012    4       0
100     2013    1       1
100     2013    2       1
100     2013    3       1
100     2013    4       0
100     2014    1       1
100     2014    2       1
100     2014    3       0
100     2014    4       1
100     2015    1       1
100     2015    2       0
100     2015    3       1
100     2015    4       1
100     2016    1       0
100     2016    2       1
100     2016    3       0
100     2016    4       1
200     2012    1       1
200     2012    2       0
200     2012    3       0
200     2012    4       0
200     2013    1       0
200     2013    2       1
200     2013    3       0
200     2013    4       0


additional sample data:
insert into sandbox_grwi.ims_patient_activity_diagnosis values
(200, '2012-03-01'), 
(200, '2013-04-01');