I have created the hive external table using below command:
use hive2;
create external table depTable (depId int comment 'This is the unique id for each dep', depName string,location string) comment 'department table' row format delimited fields terminated by ","
stored as textfile location '/dataDir/';
Now, when I view the HDFS I can see the db but there is no depTable
inside the warehouse.
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
[cloudera@quickstart ~]$
Above you can see that there is no table created in this DB. As far as I know, external tables are not stored in the hive warehouse.So am I correct ?? If yes then where is it stored ??
But if I create external table first and then load the data then I am able to see the file inside hive2.db
.
hive> create external table depTable (depId int comment 'This is the unique id for each dep', depName string,location string) comment 'department table' row format delimited fields terminated by "," stored as textfile;
OK
Time taken: 0.056 seconds
hive> load data inpath '/dataDir/department_data.txt' into table depTable;
Loading data to table default.deptable
Table default.deptable stats: [numFiles=1, totalSize=90]
OK
Time taken: 0.28 seconds
hive> select * from deptable;
OK
1001 FINANCE SYDNEY
2001 AUDIT MELBOURNE
3001 MARKETING PERTH
4001 PRODUCTION BRISBANE
Now, if I fire the hadoop fs
query I can see this table under database as below:
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
Found 1 items
drwxrwxrwx - cloudera supergroup 0 2019-01-17 09:07 /user/hive/warehouse/hive2.db/deptable
If I delete the table still I am able to see table in the HDFS as below:
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/hive2.db
Found 1 items
drwxrwxrwx - cloudera supergroup 0 2019-01-17 09:11 /user/hive/warehouse/hive2.db/deptable
So, what is the exact behavior of the external tables ?? When I create using LOCATION
keyword where does it get stored and when I create using load
statement why it is getting stored in the HDFS
and after deleting why it doesn't get deleted.
The main difference between
EXTERNAL
andMANAGED
tables is in Drop table/partition behavior. When you dropMANAGED
table/partition, the location with data files also removed. When you dropEXTERNAL
table, the location with data files remains as is.EXTERNAL
table as well asMANAGED
is being stored in the location specified in DDL. You can create table on top of existing location with data files already in the location and it will work for bothEXTERNAL
orMANAGED
, does not matter.You even can create both
EXTERNAL
andMANAGED
tables on top of the same location, see this answer with more details and tests: https://stackoverflow.com/a/54038932/2700344If you specified location, the data will be stored in that location for both types of tables. If you did not specify location, the data will be in default location:
/user/hive/warehouse/database_name.db/table_name
for both managed and external tables.See also official Hive docs on Managed vs External Tables