By enabling transactions in Hive, we can update records. Assuming I'm using AVRO format for my hive table.
https://hortonworks.com/hadoop-tutorial/using-hive-acid-transactions-insert-update-delete-data/
How does hive takes care of updating an AVRO file and replicating them again on different server ( coz replication factor is 3 ).
I could not find a good article which explains this, and the consequence of using ACID in Hive. Since HDFS is recommended for non-updating or append only files, how does this updating a record in between works.
Please advise.
Data for the table is stored in a set of base files. New records, updates, and deletes are stored in delta files. A new set of delta files is created for each transaction (or in the case of streaming agents such as Flume or Storm, each batch of transactions) that alters a table. At read time the reader merges the base and delta files, applying any updates and deletes as it reads.
Subsequently, the major compaction merges the larger delta files and/or base file into another base file on periodic interval of time that would speed up the further table scan operation.
Inserted/updated/deleted data are periodically compacted to save space and optimize data access.
The ACID Transaction feature currently has these limitations:
ACID doesn't support with AVRO file and HDFS block replacement policies are same for ACID tables too.
Below link can be more helpful to understand ACID tables in Hive.
http://docs.qubole.com/en/latest/user-guide/hive/use-hive-acid.html
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions