Hive and SparkSQL do not support datetime type?

2019-08-06 04:13发布

Is it true that Hive and SparkSQL do not support the datatype of datetime?

From my reading of the references, they seem to support only date and timestamp. The former does not a time component (i.e. hour, minute, and second); the latter has high precision (down to millisecond) but is not very human readable (it always require a conversion by from_unixtime() or date_format(), and the result would be a string, not a datetime type).

In contrast, other database systems, such as MySQL does have a datatype of datetime. (E.g. see this ref)

Any good suggestions/tips how to work around this limitation?

These are my references:

  1. Hive Language Manual: Date/Time Types
  2. Hive Language Manual: Date Functions

Updated: On human readability

Here I give an example on MySQL to illustrate my point about human-readability:

-- MySQL code 
select 
  cast(now() as date)         as asDate,          -- human readable
  cast(now() as dateTime)     as asDateTime,      -- human readable  
  unix_timestamp(now())       as asUnixTimestamp, -- not H/R
  cast(from_unixtime(unix_timestamp(now())) 
        as dateTime) 
                              asDateTimeAgain     -- cast into dateTime to make it H/R

The display would be this:

(Pay attention to the fourth column asDateTimeAgain, which is human readable)

+------------+---------------------+-----------------+---------------------+
| asDate     | asDateTime          | asUnixTimestamp | asDateTimeAgain     |
+------------+---------------------+-----------------+---------------------+
| 2017-10-21 | 2017-10-21 22:37:15 |      1508625435 | 2017-10-21 22:37:15 |
+------------+---------------------+-----------------+---------------------+
1 row in set (0.00 sec)

And the limitation is not just about human-readability. A string representation of datetime is human readable, but then it lost the property of datetime and will require further conversion back into datatime for date/time processing, such as min(), max(), and capturing the values into java.util.Date

-- Hive/SparkSQL  code
select 
  current_date                    asDate,
  unix_timestamp()                asUnixTimestamp,
  from_unixtime(unix_timestamp(), 
    'yyyy-MM-dd HH:mm:ss')        asString

The output would be this, where the third column is a string and not a datetime type

------------------------------------------------------
| asDate     | asUnixTimestamp | asString            |
| ---------- | --------------- | ------------------- |
| 2017-10-21 | 1508625586      | 2017-10-21 22:39:46 |
------------------------------------------------------

2条回答
乱世女痞
2楼-- · 2019-08-06 04:59

(I am providing the answer myself here)

Do not confuse timestamp with"unix timestamp"

timestamp is actually human readable; while "unix timestamp", being the number of seconds/milliseconds since Jan 1, 1970, is indeed not very human readable.

However, we can use cast() to convert the latter (through a function from_unixtime()) to get the former.

-- Hive/SparkSQL  code
select 
  current_date                    asDate,          -- human readable
  unix_timestamp()                asUnixTimestamp, -- not human readable
  from_unixtime(unix_timestamp()) asString,        -- human readable
  cast(from_unixtime(unix_timestamp())             
    as date)                      asDate2,         -- human readable
  cast(from_unixtime(unix_timestamp())             
    as timestamp)                 asTimestamp      -- human readable

Result:

-------------------------------------------------------------------------------------------
| asDate     | asUnixTimestamp | asString            | asDate2    | asTimestamp           |
| ---------- | --------------- | ------------------- | ---------- | --------------------- |
| 2017-10-22 | 1508687321      | 2017-10-22 15:48:41 | 2017-10-22 | 2017-10-22 15:48:41.0 |
-------------------------------------------------------------------------------------------
查看更多
劳资没心,怎么记你
3楼-- · 2019-08-06 05:06

Timestamp does what you want. I'm not sure what you mean by "human-readable". All databases store dates and date times in an internal format, that is not at all related to the display format. And -- to a close approximation -- the internal formats are different for different databases. (Wikipedia has a fun article on some of the different "0" dates used by different computer systems.)

If you want to see the value in a particular way, then format the result as a string. Hive has a full complement of string functions for handling timestamp values, and it easily converts string literals into timestamps, when needed.

查看更多
登录 后发表回答