How to record created_at and updated_at timestamps

2019-01-28 22:11发布

MySQL can automatically record created_at and updated_at timestamps. Does Hive provide similar mechanisms? If not, what would be the best way to achieve this functionality?

标签: sql hive hiveql
1条回答
你好瞎i
2楼-- · 2019-01-28 22:58

Hive does not provide such mechanism. You can achieve this by using UDF in your select: from_unixtime(unix_timestamp()) as created_at. Note this will be executed in each mapper or reducer and may return different values. If you need the same value for all the dataset (for Hive version before 1.2.0), pass the variable to the script and use it inside as: '${hiveconf:created_at}' as created_at

Update: current_timestamp returns the current timestamp at the start of query evaluation (as of Hive 1.2.0). All calls of current_timestamp within the same query return the same value. unix_timestamp() Gets current Unix timestamp in seconds. This function is non-deterministic and prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP constant. So, it's not a function, it's a constant! See this docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

For hive queries CURRENT_TIMESTAMP is preferable when you rewrite tables or partitions or insert into because all the file(s) anyway are being rewritten, not records, therefore the created_at timestamp should be the same.

查看更多
登录 后发表回答