How to delete and update a record in Hive

2020-01-25 12:47发布

I have install Hadoop, Hive, Hive JD BC. which are running fine for me. But I still have a problem. How to delete or update a single record using Hive because delete or update command of MySQL is not working in hive.

Thanks

hive> delete from student where id=1;
Usage: delete [FILE|JAR|ARCHIVE] <value> [<value>]*
Query returned non-zero code: 1, cause: null

14条回答
The star\"
2楼-- · 2020-01-25 13:39

The CLI told you where is your mistake : delete WHAT? from student ...

Delete : How to delete/truncate tables from Hadoop-Hive?

Update : Update , SET option in Hive

查看更多
手持菜刀,她持情操
3楼-- · 2020-01-25 13:41

UPDATE or DELETE a record isn't allowed in Hive, but INSERT INTO is acceptable.
A snippet from Hadoop: The Definitive Guide(3rd edition):

Updates, transactions, and indexes are mainstays of traditional databases. Yet, until recently, these features have not been considered a part of Hive's feature set. This is because Hive was built to operate over HDFS data using MapReduce, where full-table scans are the norm and a table update is achieved by transforming the data into a new table. For a data warehousing application that runs over large portions of the dataset, this works well.

Hive doesn't support updates (or deletes), but it does support INSERT INTO, so it is possible to add new rows to an existing table.

查看更多
【Aperson】
4楼-- · 2020-01-25 13:43

Configuration Values to Set for INSERT, UPDATE, DELETE In addition to the new parameters listed above, some existing parameters need to be set to support INSERT ... VALUES, UPDATE, and DELETE.

Configuration key Must be set to

hive.support.concurrency true (default is false) hive.enforce.bucketing true (default is false) (Not required as of Hive 2.0) hive.exec.dynamic.partition.mode nonstrict (default is strict)

Configuration Values to Set for Compaction

If the data in your system is not owned by the Hive user (i.e., the user that the Hive metastore runs as), then Hive will need permission to run as the user who owns the data in order to perform compactions. If you have already set up HiveServer2 to impersonate users, then the only additional work to do is assure that Hive has the right to impersonate users from the host running the Hive metastore. This is done by adding the hostname to hadoop.proxyuser.hive.hosts in Hadoop's core-site.xml file. If you have not already done this, then you will need to configure Hive to act as a proxy user. This requires you to set up keytabs for the user running the Hive metastore and add hadoop.proxyuser.hive.hosts and hadoop.proxyuser.hive.groups to Hadoop's core-site.xml file. See the Hadoop documentation on secure mode for your version of Hadoop (e.g., for Hadoop 2.5.1 it is at Hadoop in Secure Mode).

The UPDATE statement has the following limitations:

The expression in the WHERE clause must be an expression supported by a Hive SELECT clause.

Partition and bucket columns cannot be updated.

Query vectorization is automatically disabled for UPDATE statements. However, updated tables can still be queried using vectorization.

Subqueries are not allowed on the right side of the SET statement.

The following example demonstrates the correct usage of this statement:

UPDATE students SET name = null WHERE gpa <= 1.0;

DELETE Statement

Use the DELETE statement to delete data already written to Apache Hive.

DELETE FROM tablename [WHERE expression];

The DELETE statement has the following limitation: query vectorization is automatically disabled for the DELETE operation. However, tables with deleted data can still be queried using vectorization.

The following example demonstrates the correct usage of this statement:

DELETE FROM students WHERE gpa <= 1,0;

查看更多
家丑人穷心不美
5楼-- · 2020-01-25 13:44

You can delete rows from a table using a workaround, in which you overwrite the table by the dataset you want left into the table as a result of your operation.

insert overwrite table your_table 
    select * from your_table 
    where id <> 1
;

The workaround is useful mostly for bulk deletions of easily identifiable rows. Also, obviously doing this can muck up your data, so a backup of the table is adviced and care when planning the "deletion" rule also adviced.

查看更多
孤傲高冷的网名
6楼-- · 2020-01-25 13:45

Good news,Insert updates and deletes are now possible on Hive/Impala using Kudu.

You need to use IMPALA/kudu to maintain the tables and perform insert/update/delete records. Details with examples can be found here: insert-update-delete-on-hadoop

Please share the news if you are excited.

-MIK

查看更多
我欲成王,谁敢阻挡
7楼-- · 2020-01-25 13:46

Delete has been recently added in Hive version 0.14 Deletes can only be performed on tables that support ACID Below is the link from Apache .

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Delete

查看更多
登录 后发表回答