How do I get millisecond precision in hive?

2019-01-23 19:59发布

问题:

The documentation says that timestamps support the following conversion:

•Floating point numeric types: Interpreted as UNIX timestamp in seconds with decimal precision

First of all, I'm not sure how to interpret this. If I have a timestamp 2013-01-01 12:00:00.423, can I convert this to a numeric type that retains the milliseconds? Because that is what I want.

More generally, I need to do comparisons between timestamps such as

select maxts - mints as latency from mytable

where maxts and mints are timestamp columns. Currently, this gives me NullPointerException using Hive 0.11.0. I am able to perform queries if I do something like

select unix_timestamp(maxts) - unix_timestamp(mints) as latency from mytable

but this only works for seconds, not millisecond precision.

Any help appreciated. Tell me if you need additional information.

回答1:

If you want to work with milliseconds, don't use the unix timestamp functions because these consider date as seconds since epoch.

hive> describe function extended unix_timestamp;
unix_timestamp([date[, pattern]]) - Returns the UNIX timestamp
Converts the current or specified time to number of seconds since 1970-01-01.

Instead, convert the JDBC compliant timestamp to double.
E.g:

Given a tab delimited data:

cat /user/hive/ts/data.txt :
a   2013-01-01 12:00:00.423   2013-01-01 12:00:00.433
b   2013-01-01 12:00:00.423   2013-01-01 12:00:00.733

CREATE EXTERNAL TABLE ts (txt string, st Timestamp, et Timestamp) 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/ts';

Then you may query the difference between startTime(st) and endTime(et) in milliseconds as follows:

select 
  txt, 
  cast(
    round(
      cast((e-s) as double) * 1000
    ) as int
  ) latency 
from (select txt, cast(st as double) s, cast(et as double) e from ts) q;