Get the sysdate -1 in Hive

2019-05-01 04:42发布

问题:

Is there any way to get the current date -1 in Hive means yesterdays date always? And in this format- 20120805?

I can run my query like this to get the data for yesterday's date as today is Aug 6th-

select * from table1 where dt = '20120805';

But when I tried doing this way with date_sub function to get the yesterday's date as the below table is partitioned on date(dt) column.

select * from table1 where dt = date_sub(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(),
'yyyyMMdd')) , 1)     limit 10;

It is looking for the data in all the partitions? Why? Something wrong I am doing in my query?

How I can make the evaluation happen in a subquery to avoid the whole table scanned?

回答1:

Try something like:

select * from table1 
where dt >= from_unixtime(unix_timestamp()-1*60*60*24, 'yyyyMMdd');

This works if you don't mind that hive scans the entire table. from_unixtime is not deterministic, so the query planner in Hive won't optimize for you. For many cases (for example log files), not specifying a deterministic partition key can cause a very large hadoop job to start since it will scan the whole table, not just the rows with the given partition key.

If this matters to you, you can launch hive with an additional option

$ hive -hiveconf date_yesterday=20150331

And in the script or hive terminal use

select * from table1
where dt >= ${hiveconf:date_yesterday};

The name of the variable doesn't matter, nor does the value, you can set them in this case to get the prior date using unix commands. In the specific case of the OP

$ hive -hiveconf date_yesterday=$(date --date yesterday "+%Y%m%d")


回答2:

In mysql:

select DATE_FORMAT(curdate()-1,'%Y%m%d');

In sqlserver :

SELECT convert(varchar,getDate()-1,112)

Use this query:

SELECT FROM_UNIXTIME(UNIX_TIMESTAMP()-1*24*60*60,'%Y%m%d');


回答3:

It looks like DATE_SUB assumes date in format yyyy-MM-dd. So you might have to do some more format manipulation to get to your format. Try this:

select * from table1 
where dt =  FROM_UNIXTIME(
                UNIX_TIMESTAMP(
                    DATE_SUB(
                        FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd')
                    , 1)
                )
            , 'yyyyMMdd')     limit 10;


回答4:

Use this:

select * from table1 where dt = date_format(concat(year(date_sub(current_timestamp,1)),'-', month(date_sub(current_timestamp,1)), '-', day(date_sub(current_timestamp,1))), 'yyyyMMdd') limit 10;

This will give a deterministic result (a string) of your partition.

I know it's super verbose.