I read that we cannot create a primary key on a column in a Hive table. But I saw the below DDL in some other place and executed it. It worked without any problem.
create table prim(id int, name char(30))
TBLPROPERTIES("PRIMARY KEY"="id");
After this I executed "describe formatted prim" and got to see that a key is created on the column ID
Table Parameters:
PRIMARY KEY id
I inserted two records with same ID number into the table.
insert into prim values(1,'ABCD');
insert into prim values(2,'EFGH');
Both the records were inserted into the table. What baffles me is that we cannot give the PRIMARY KEY in the create statement which I can understand, but when given in TBLPROPERTIES("PRIMARY KEY"="id")
how different is it to the primary key in RDBMS.
PRIMARY KEY
inTBLPROPERTIES
is for metadata reference to preserve column significance. It does not apply any constrain on that column. This can be used as a reference from design perspective.