Insert overwrite partition in Hive table - Values

2019-08-04 03:41发布

I created a Hive table with Non-partition table and using select query I inserted data into Partitioned Hive table.

Refered site

  1. By following above link my partition table contains duplicate values. Below are the setps

This is my Sample employee dataset:link1

I tried the following queries: link2

But after updating a value in Hive table,

Updating salary of Steven with EmployeeID 19 to 50000.

INSERT OVERWRITE TABLE Unm_Parti_Trail PARTITION (Department = 'A') SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail;

the values are getting duplicated.

7       Nirmal  Tech    12000   A
7       Nirmal  Tech    12000   B

Nirmal is placed in Department A only but it is duplicated to department B.

Am I doing anything wrong?

Please suggest.

2条回答
Anthone
2楼-- · 2019-08-04 04:19

One possible solution.

When you do the insert it is necessary to select the partitioning fields as the last ones on the query. Eg:

INSERT INTO TABLE Unm_Parti_Trail PARTITION(department='A') 
SELECT EmployeeID, FirstName,Designation,Salary, Department 
FROM Unm_Dup_Parti_Trail
WHERE department='A';

See this link for more info.

查看更多
beautiful°
3楼-- · 2019-08-04 04:27

It seems like you forgot the WHERE clause in your last INSERT OVERWRITE:

INSERT INTO TABLE Unm_Parti_Trail PARTITION (Department = 'A') 
SELECT employeeid,firstname,designation, CASE WHEN employeeid=19 
THEN 50000 ELSE salary END AS salary FROM Unm_Parti_Trail 
WHERE department = 'A';
查看更多
登录 后发表回答