How updating data in hive transaction tables resul

By enabling transactions in Hive, we can update records. Assuming I'm using AVRO format for my hive table.

https://hortonworks.com/hadoop-tutorial/using-hive-acid-transactions-insert-update-delete-data/

How does hive takes care of updating an AVRO file and replicating them again on different server ( coz replication factor is 3 ).

I could not find a good article which explains this, and the consequence of using ACID in Hive. Since HDFS is recommended for non-updating or append only files, how does this updating a record in between works.

Please advise.

标签： hive avro

1条回答

趁早两清

2楼-- · 2019-07-29 05:01

Data for the table is stored in a set of base files. New records, updates, and deletes are stored in delta files. A new set of delta files is created for each transaction (or in the case of streaming agents such as Flume or Storm, each batch of transactions) that alters a table. At read time the reader merges the base and delta files, applying any updates and deletes as it reads.

Subsequently, the major compaction merges the larger delta files and/or base file into another base file on periodic interval of time that would speed up the further table scan operation.

Inserted/updated/deleted data are periodically compacted to save space and optimize data access.

The ACID Transaction feature currently has these limitations:

It only works for ORC file. There is a JIRA in open source to add support for Parquet tables.
It works only for non-sorted bucketed tables.
INSERT OVERWRITE is not supported for transactions.
It does not support for BEGIN, COMMIT, or ROLLBACK Transactions.
It is not recommended for OLTP.

ACID doesn't support with AVRO file and HDFS block replacement policies are same for ACID tables too.

Below link can be more helpful to understand ACID tables in Hive.

http://docs.qubole.com/en/latest/user-guide/hive/use-hive-acid.html

https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

0人赞添加讨论(0) 举报

How updating data in hive transaction tables resul

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间