The documentation says that timestamps support the following conversion:
•Floating point numeric types: Interpreted as UNIX timestamp in seconds with decimal precision
First of all, I'm not sure how to interpret this. If I have a timestamp 2013-01-01 12:00:00.423, can I convert this to a numeric type that retains the milliseconds? Because that is what I want.
More generally, I need to do comparisons between timestamps such as
select maxts - mints as latency from mytable
where maxts and mints are timestamp columns. Currently, this gives me NullPointerException
using Hive 0.11.0. I am able to perform queries if I do something like
select unix_timestamp(maxts) - unix_timestamp(mints) as latency from mytable
but this only works for seconds, not millisecond precision.
Any help appreciated. Tell me if you need additional information.
If you want to work with milliseconds, don't use the unix timestamp functions because these consider date as seconds since epoch.
hive> describe function extended unix_timestamp;
unix_timestamp([date[, pattern]]) - Returns the UNIX timestamp
Converts the current or specified time to number of seconds since 1970-01-01.
Instead, convert the JDBC compliant timestamp to double.
E.g:
Given a tab delimited data:
cat /user/hive/ts/data.txt :
a 2013-01-01 12:00:00.423 2013-01-01 12:00:00.433
b 2013-01-01 12:00:00.423 2013-01-01 12:00:00.733
CREATE EXTERNAL TABLE ts (txt string, st Timestamp, et Timestamp)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/ts';
Then you may query the difference between startTime(st) and endTime(et) in milliseconds as follows:
select
txt,
cast(
round(
cast((e-s) as double) * 1000
) as int
) latency
from (select txt, cast(st as double) s, cast(et as double) e from ts) q;