Facing issue in Hive query in generating missing d

2020-04-14 07:41发布

问题:

I have a requirement where I need to go back to previous values for a column until 1000 rows and get those previous 1000 dates for my next steps, but all those 1000 previous dates are not present for that column in the table. But I need those missing dates to get from output of the query.

When I try to run below query it is not displaying 1000 previous date values from current date.

Example: let's say only 2 dates are available for date column

date      
2019-01-16 
2019-01-19

I have come up with a query to get back 1000 dates but it is giving only nearest date as all previous back dates are missing

SELECT date FROM  table1 t
WHERE 
date >= date_sub(current_date,1000) and  dt<current_date ORDER BY date LIMIT 1

If I run above query it is displaying 2019-01-16, since previous 1000 days back date are not present it is giving nearest date ,which is 2019-01-16 but I need missing dates starting from 2016-04-23 (1000th date from current date) till before current date (2019-01-18) as output of my query.

回答1:

You can generate dates for required range in the subquery (see date_range subquery in the example below) and left join it with your table. If there is no record in your table on some dates, the value will be null, dates will be returned from the date_range subquery without gaps. Set start_date and end_date parameters for date_range required:

set hivevar:start_date=2016-04-23; --replace with your start_date
set hivevar:end_date=current_date; --replace with your end_date

set hive.exec.parallel=true;
set hive.auto.convert.join=true; --this enables map-join
set hive.mapjoin.smalltable.filesize=25000000; --size of table to fit in memory

with date_range as 
(--this query generates date range, check it's output
select date_add ('${hivevar:start_date}',s.i) as dt 
  from ( select posexplode(split(space(datediff(${hivevar:end_date},'${hivevar:start_date}')),' ')) as (i,x) ) s
) 

select d.dt as date,
       t.your_col --some value from your table on date
  from date_range d 
       left join table1 t on d.dt=t.date 
order by d.dt --order by dates if necessary