How to delete and update a record in Hive

2020-01-25 12:47发布

I have install Hadoop, Hive, Hive JD BC. which are running fine for me. But I still have a problem. How to delete or update a single record using Hive because delete or update command of MySQL is not working in hive.

Thanks

hive> delete from student where id=1;
Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]*
Query returned non-zero code: 1, cause: null

14条回答
干净又极端
2楼-- · 2020-01-25 13:24

To achieve your current need, you need to fire below query

> insert overwrite table student 
> select *from student 
> where id <> 1;

This will delete current table and create new table with same name with all rows except the rows that you want to exclude/delete

I tried this on Hive 1.2.1

查看更多
Rolldiameter
3楼-- · 2020-01-25 13:29

Recently I was looking to resolve a similar issue, Apache Hive, Hadoop do not support Update/Delete operations. So ? So you have two ways:

  1. Use a backup table: Save the whole table in a backup_table, then truncate your input table, then re-write only the data you are intrested to mantain.
  2. Use Uber Hudi: It's a framework created by Uber to resolve the HDFS limitations including Deletion and Update. You can give a look in this link: https://eng.uber.com/hoodie/

an example for point 1:

Create table bck_table like input_table;
Insert overwrite table bck_table 
    select * from input_table;
Truncate table input_table;
Insert overwrite table input_table
    select * from bck_table where id <> 1;

NB: If the input_table is an external table you must follow the following link: How to truncate a partitioned external table in hive?

查看更多
干净又极端
4楼-- · 2020-01-25 13:30

Upcoming version of Hive is going to allow SET based update/delete handling which is of utmost importance when trying to do CRUD operations on a 'bunch' of rows instead of taking one row at a time.

In the interim , I have tried a dynamic partition based approach documented here http://linkd.in/1Fq3wdb .

Please see if it suits your need.

查看更多
forever°为你锁心
5楼-- · 2020-01-25 13:35

Yes, rightly said. Hive does not support UPDATE option. But the following alternative could be used to achieve the result:

Update records in a partitioned Hive table:

  1. The main table is assumed to be partitioned by some key.
  2. Load the incremental data (the data to be updated) to a staging table partitioned with the same keys as the main table.
  3. Join the two tables (main & staging tables) using a LEFT OUTER JOIN operation as below:

insert overwrite table main_table partition (c,d) select t2.a, t2.b, t2.c,t2.d from staging_table t2 left outer join main_table t1 on t1.a=t2.a;

In the above example, the main_table & the staging_table are partitioned using the (c,d) keys. The tables are joined via a LEFT OUTER JOIN and the result is used to OVERWRITE the partitions in the main_table.

A similar approach could be used in the case of un-partitioned Hive table UPDATE operations too.

查看更多
Root(大扎)
6楼-- · 2020-01-25 13:36

If you want to delete all records then as a workaround load an empty file into table in OVERWRITE mode

hive> LOAD DATA LOCAL INPATH '/root/hadoop/textfiles/empty.txt' OVERWRITE INTO TABLE employee;
Loading data to table default.employee
Table default.employee stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]
OK
Time taken: 0.19 seconds

hive> SELECT * FROM employee;
OK
Time taken: 0.052 seconds
查看更多
Ridiculous、
7楼-- · 2020-01-25 13:37

As of Hive version 0.14.0: INSERT...VALUES, UPDATE, and DELETE are now available with full ACID support.

INSERT ... VALUES Syntax:

INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]

Where values_row is: ( value [, value ...] ) where a value is either null or any valid SQL literal

UPDATE Syntax:

UPDATE tablename SET column = value [, column = value ...] [WHERE expression]

DELETE Syntax:

DELETE FROM tablename [WHERE expression]

Additionally, from the Hive Transactions doc:

If a table is to be used in ACID writes (insert, update, delete) then the table property "transactional" must be set on that table, starting with Hive 0.14.0. Without this value, inserts will be done in the old style; updates and deletes will be prohibited.

Hive DML reference:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
Hive Transactions reference:
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

查看更多
登录 后发表回答