Why partitions elimination does not happen for thi

2019-01-20 18:32发布

I have a hive table which is partitioned by year, month, day and hour. I need to run a query against it to fetch the last 7 days data. This is in Hive 0.14.0.2.2.4.2-2. My query currently looks like this :

SELECT COUNT(column_name) from table_name 
where year >= year(date_sub(from_unixtime(unix_timestamp()), 7)) 
AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) 
AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7));

This takes a very long time. When I substitute the actual numbers for the above say something like :

SELECT COUNT(column_name) from table_name 
where year >= 2017
AND month >= 2
AND day >= 13

it finishes in a few minutes. Is there any way to change the above script so that is actually includes just the numbers in the query instead of the functions?

I tried using set like:

set yearLimit = year(date_sub(from_unixtime(unix_timestamp()), 7));

SELECT COUNT(column_name) from table_name 
where year >= ${hiveconf:yearLimit}
AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) 
AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7));

but this does not solve the issue.

1条回答
老娘就宠你
2楼-- · 2019-01-20 19:02

Solution

select      count (column_name) 

from        table_name 

where       year  >= year  (date_sub (current_date,7)) 
        and month >= month (date_sub (current_date,7)) 
        and day   >= day   (date_sub (current_date,7))
;

What went wrong with the original query?

unix_timestamp()

Gets current Unix timestamp in seconds. This function is not deterministic and its value is not fixed for the scope of a query execution, therefore prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP constant.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

(I've just changed the documentation a little bit :-))

Since unix_timestamp() values might change during the execution, the expression should be evaluated for each row, therefore preventing partitions elimination.

Why using SET did not work?

set is nothing but a text replacement mechanism.
Nothing is being computed during the set.
The only thing that happens is that variables are being assigned a text.
Before the query is being executed the variables place holders (${hiveconf:...}) are being replaced with the assigned text.
Only then the query is being parsed and executed.

hive> set a=sele;
hive> set b=ct 1+;
hive> set c=1;
hive> ${hiveconf:a}${hiveconf:b}${hiveconf:c};
OK
2

Demo

create table table_name (column_name int) partitioned by (year int,month int,day int);

set hive.exec.dynamic.partition.mode=nonstrict;


insert into table_name partition (year,month,day) 

select  pos
       ,year(dt)
       ,month(dt)
       ,day(dt) 

from   (select  pe.pos
               ,date_sub (current_date,pe.pos) as dt

        from    (select 1) x 
                lateral view posexplode (split (space (99),' ')) pe
        ) t
;

explain dependency

select      count (column_name) 

from        table_name 

where       year  >= year  (date_sub (from_unixtime (unix_timestamp ()),7)) 
        and month >= month (date_sub (from_unixtime (unix_timestamp ()),7)) 
        and day   >= day   (date_sub (from_unixtime (unix_timestamp ()),7))
;

{"input_partitions":[{"partitionName":"default@table_name@year=2016/month=11/day=14"},{"partitionName":"default@table_name@year=2016/month=11/day=15"},{"partitionName":"default@table_name@year=2016/month=11/day=16"},{"partitionName":"default@table_name@year=2016/month=11/day=17"},{"partitionName":"default@table_name@year=2016/month=11/day=18"},{"partitionName":"default@table_name@year=2016/month=11/day=19"},{"partitionName":"default@table_name@year=2016/month=11/day=20"},{"partitionName":"default@table_name@year=2016/month=11/day=21"},{"partitionName":"default@table_name@year=2016/month=11/day=22"},{"partitionName":"default@table_name@year=2016/month=11/day=23"},{"partitionName":"default@table_name@year=2016/month=11/day=24"},{"partitionName":"default@table_name@year=2016/month=11/day=25"},{"partitionName":"default@table_name@year=2016/month=11/day=26"},{"partitionName":"default@table_name@year=2016/month=11/day=27"},{"partitionName":"default@table_name@year=2016/month=11/day=28"},{"partitionName":"default@table_name@year=2016/month=11/day=29"},{"partitionName":"default@table_name@year=2016/month=11/day=30"},{"partitionName":"default@table_name@year=2016/month=12/day=1"},{"partitionName":"default@table_name@year=2016/month=12/day=10"},{"partitionName":"default@table_name@year=2016/month=12/day=11"},{"partitionName":"default@table_name@year=2016/month=12/day=12"},{"partitionName":"default@table_name@year=2016/month=12/day=13"},{"partitionName":"default@table_name@year=2016/month=12/day=14"},{"partitionName":"default@table_name@year=2016/month=12/day=15"},{"partitionName":"default@table_name@year=2016/month=12/day=16"},{"partitionName":"default@table_name@year=2016/month=12/day=17"},{"partitionName":"default@table_name@year=2016/month=12/day=18"},{"partitionName":"default@table_name@year=2016/month=12/day=19"},{"partitionName":"default@table_name@year=2016/month=12/day=2"},{"partitionName":"default@table_name@year=2016/month=12/day=20"},{"partitionName":"default@table_name@year=2016/month=12/day=21"},{"partitionName":"default@table_name@year=2016/month=12/day=22"},{"partitionName":"default@table_name@year=2016/month=12/day=23"},{"partitionName":"default@table_name@year=2016/month=12/day=24"},{"partitionName":"default@table_name@year=2016/month=12/day=25"},{"partitionName":"default@table_name@year=2016/month=12/day=26"},{"partitionName":"default@table_name@year=2016/month=12/day=27"},{"partitionName":"default@table_name@year=2016/month=12/day=28"},{"partitionName":"default@table_name@year=2016/month=12/day=29"},{"partitionName":"default@table_name@year=2016/month=12/day=3"},{"partitionName":"default@table_name@year=2016/month=12/day=30"},{"partitionName":"default@table_name@year=2016/month=12/day=31"},{"partitionName":"default@table_name@year=2016/month=12/day=4"},{"partitionName":"default@table_name@year=2016/month=12/day=5"},{"partitionName":"default@table_name@year=2016/month=12/day=6"},{"partitionName":"default@table_name@year=2016/month=12/day=7"},{"partitionName":"default@table_name@year=2016/month=12/day=8"},{"partitionName":"default@table_name@year=2016/month=12/day=9"},{"partitionName":"default@table_name@year=2017/month=1/day=1"},{"partitionName":"default@table_name@year=2017/month=1/day=10"},{"partitionName":"default@table_name@year=2017/month=1/day=11"},{"partitionName":"default@table_name@year=2017/month=1/day=12"},{"partitionName":"default@table_name@year=2017/month=1/day=13"},{"partitionName":"default@table_name@year=2017/month=1/day=14"},{"partitionName":"default@table_name@year=2017/month=1/day=15"},{"partitionName":"default@table_name@year=2017/month=1/day=16"},{"partitionName":"default@table_name@year=2017/month=1/day=17"},{"partitionName":"default@table_name@year=2017/month=1/day=18"},{"partitionName":"default@table_name@year=2017/month=1/day=19"},{"partitionName":"default@table_name@year=2017/month=1/day=2"},{"partitionName":"default@table_name@year=2017/month=1/day=20"},{"partitionName":"default@table_name@year=2017/month=1/day=21"},{"partitionName":"default@table_name@year=2017/month=1/day=22"},{"partitionName":"default@table_name@year=2017/month=1/day=23"},{"partitionName":"default@table_name@year=2017/month=1/day=24"},{"partitionName":"default@table_name@year=2017/month=1/day=25"},{"partitionName":"default@table_name@year=2017/month=1/day=26"},{"partitionName":"default@table_name@year=2017/month=1/day=27"},{"partitionName":"default@table_name@year=2017/month=1/day=28"},{"partitionName":"default@table_name@year=2017/month=1/day=29"},{"partitionName":"default@table_name@year=2017/month=1/day=3"},{"partitionName":"default@table_name@year=2017/month=1/day=30"},{"partitionName":"default@table_name@year=2017/month=1/day=31"},{"partitionName":"default@table_name@year=2017/month=1/day=4"},{"partitionName":"default@table_name@year=2017/month=1/day=5"},{"partitionName":"default@table_name@year=2017/month=1/day=6"},{"partitionName":"default@table_name@year=2017/month=1/day=7"},{"partitionName":"default@table_name@year=2017/month=1/day=8"},{"partitionName":"default@table_name@year=2017/month=1/day=9"},{"partitionName":"default@table_name@year=2017/month=2/day=1"},{"partitionName":"default@table_name@year=2017/month=2/day=10"},{"partitionName":"default@table_name@year=2017/month=2/day=11"},{"partitionName":"default@table_name@year=2017/month=2/day=12"},{"partitionName":"default@table_name@year=2017/month=2/day=13"},{"partitionName":"default@table_name@year=2017/month=2/day=14"},{"partitionName":"default@table_name@year=2017/month=2/day=15"},{"partitionName":"default@table_name@year=2017/month=2/day=16"},{"partitionName":"default@table_name@year=2017/month=2/day=17"},{"partitionName":"default@table_name@year=2017/month=2/day=18"},{"partitionName":"default@table_name@year=2017/month=2/day=19"},{"partitionName":"default@table_name@year=2017/month=2/day=2"},{"partitionName":"default@table_name@year=2017/month=2/day=20"},{"partitionName":"default@table_name@year=2017/month=2/day=21"},{"partitionName":"default@table_name@year=2017/month=2/day=3"},{"partitionName":"default@table_name@year=2017/month=2/day=4"},{"partitionName":"default@table_name@year=2017/month=2/day=5"},{"partitionName":"default@table_name@year=2017/month=2/day=6"},{"partitionName":"default@table_name@year=2017/month=2/day=7"},{"partitionName":"default@table_name@year=2017/month=2/day=8"},{"partitionName":"default@table_name@year=2017/month=2/day=9"}],"input_tables":[{"tablename":"default@table_name","tabletype":"MANAGED_TABLE"}]}

explain dependency

select      count (column_name) 

from        table_name 

where       year  >= year  (date_sub (current_date,7)) 
        and month >= month (date_sub (current_date,7)) 
        and day   >= day   (date_sub (current_date,7))
;

{"input_partitions":[{"partitionName":"default@table_name@year=2017/month=2/day=14"},{"partitionName":"default@table_name@year=2017/month=2/day=15"},{"partitionName":"default@table_name@year=2017/month=2/day=16"},{"partitionName":"default@table_name@year=2017/month=2/day=17"},{"partitionName":"default@table_name@year=2017/month=2/day=18"},{"partitionName":"default@table_name@year=2017/month=2/day=19"},{"partitionName":"default@table_name@year=2017/month=2/day=20"},{"partitionName":"default@table_name@year=2017/month=2/day=21"}],"input_tables":[{"tablename":"default@table_name","tabletype":"MANAGED_TABLE"}]}

查看更多
登录 后发表回答