Is it true that Hive
and SparkSQL
do not support the datatype of datetime
?
From my reading of the references, they seem to support only date
and timestamp
. The former does not a time component (i.e. hour, minute, and second); the latter has high precision (down to millisecond) but is not very human readable (it always require a conversion by from_unixtime()
or date_format()
, and the result would be a string, not a datetime
type).
In contrast, other database systems, such as MySQL
does have a datatype of datetime
. (E.g. see this ref)
Any good suggestions/tips how to work around this limitation?
These are my references:
Updated: On human readability
Here I give an example on MySQL
to illustrate my point
about human-readability:
-- MySQL code
select
cast(now() as date) as asDate, -- human readable
cast(now() as dateTime) as asDateTime, -- human readable
unix_timestamp(now()) as asUnixTimestamp, -- not H/R
cast(from_unixtime(unix_timestamp(now()))
as dateTime)
asDateTimeAgain -- cast into dateTime to make it H/R
The display would be this:
(Pay attention to the fourth column asDateTimeAgain
, which is human readable)
+------------+---------------------+-----------------+---------------------+
| asDate | asDateTime | asUnixTimestamp | asDateTimeAgain |
+------------+---------------------+-----------------+---------------------+
| 2017-10-21 | 2017-10-21 22:37:15 | 1508625435 | 2017-10-21 22:37:15 |
+------------+---------------------+-----------------+---------------------+
1 row in set (0.00 sec)
And the limitation is not just about human-readability. A string
representation of datetime
is
human readable, but then it lost the property of datetime
and
will require further conversion back into datatime
for date/time processing,
such as min()
, max()
, and capturing the values into java.util.Date
-- Hive/SparkSQL code
select
current_date asDate,
unix_timestamp() asUnixTimestamp,
from_unixtime(unix_timestamp(),
'yyyy-MM-dd HH:mm:ss') asString
The output would be this, where the third column is a string and not
a datetime
type
------------------------------------------------------
| asDate | asUnixTimestamp | asString |
| ---------- | --------------- | ------------------- |
| 2017-10-21 | 1508625586 | 2017-10-21 22:39:46 |
------------------------------------------------------
(I am providing the answer myself here)
Do not confuse
timestamp
with"unix timestamp"timestamp
is actually human readable; while "unix timestamp", being the number of seconds/milliseconds since Jan 1, 1970, is indeed not very human readable.However, we can use
cast()
to convert the latter (through a functionfrom_unixtime()
) to get the former.Result:
Timestamp
does what you want. I'm not sure what you mean by "human-readable". All databases store dates and date times in an internal format, that is not at all related to the display format. And -- to a close approximation -- the internal formats are different for different databases. (Wikipedia has a fun article on some of the different "0" dates used by different computer systems.)If you want to see the value in a particular way, then format the result as a string. Hive has a full complement of string functions for handling
timestamp
values, and it easily converts string literals into timestamps, when needed.