MySQL can automatically record created_at and updated_at timestamps. Does Hive provide similar mechanisms? If not, what would be the best way to achieve this functionality?
相关问题
- SQL join to get the cartesian product of 2 columns
- sql execution latency when assign to a variable
- Difference between Types.INTEGER and Types.NULL in
- php PDO::FETCH_ASSOC doesnt detect select after ba
- Bulk update SQL Server C#
Hive does not provide such mechanism. You can achieve this by using UDF in your select:
from_unixtime(unix_timestamp()) as created_at
. Note this will be executed in each mapper or reducer and may return different values. If you need the same value for all the dataset (for Hive version before 1.2.0), pass the variable to the script and use it inside as:'${hiveconf:created_at}' as created_at
Update:
current_timestamp
returns the current timestamp at the start of query evaluation (as of Hive 1.2.0). All calls of current_timestamp within the same query return the same value.unix_timestamp()
Gets current Unix timestamp in seconds. This function is non-deterministic and prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP constant. So, it's not a function, it's a constant! See this docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDFFor hive queries CURRENT_TIMESTAMP is preferable when you rewrite tables or partitions or insert into because all the file(s) anyway are being rewritten, not records, therefore the
created_at
timestamp should be the same.