Difference between Hive internal tables and extern-第3页回答

Can anyone tell me the difference between Hive's external table and internal tables. I know the difference comes when dropping the table. I don't understand what you mean by the data and metadata is deleted in internal and only metadata is deleted in external tables. Can anyone explain me in terms of nodes please.

标签： hadoop hive hiveql

17条回答

Lonely孤独者°

2楼-- · 2020-01-23 15:46

hive stores only the meta data in metastore and original data in out side of hive when we use external table we can give location' ' by these our original data wont effect when we drop the table

0人赞添加讨论(0) 举报

该账号已被封号

3楼-- · 2020-01-23 15:47

Internal tables are useful if you want Hive to manage the complete lifecycle of your data including the deletion, whereas external tables are useful when the files are being used outside of Hive.

0人赞添加讨论(0) 举报

Juvenile、少年°

4楼-- · 2020-01-23 15:50

INTERNAL : Table is created First and Data is loaded later

EXTERNAL : Data is present and Table is created on top of it.

0人赞添加讨论(0) 举报

欢心

5楼-- · 2020-01-23 15:54

Hive has a relational database on the master node it uses to keep track of state. For instance, when you CREATE TABLE FOO(foo string) LOCATION 'hdfs://tmp/';, this table schema is stored in the database.

If you have a partitioned table, the partitions are stored in the database(this allows hive to use lists of partitions without going to the file-system and finding them, etc). These sorts of things are the 'metadata'.

When you drop an internal table, it drops the data, and it also drops the metadata.

When you drop an external table, it only drops the meta data. That means hive is ignorant of that data now. It does not touch the data itself.

0人赞添加讨论(0) 举报

\"骚年 ilove

6楼-- · 2020-01-23 15:54

Consider this scenario which best suits for External Table:

A MapReduce (MR) job filters a huge log file to spit out n sub log files (e.g. each sub log file contains a specific message type log) and the output i.e n sub log files are stored in hdfs.

These log files are to be loaded into Hive tables for performing further analytic, in this scenario I would recommend an External Table(s), because the actual log files are generated and owned by an external process i.e. a MR job besides you can avoid an additional step of loading each generated log file into respective Hive table as well.

0人赞添加讨论(0) 举报

上一页 1 2 3

Difference between Hive internal tables and extern

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间