Can anyone tell me the difference between Hive's external table and internal tables. I know the difference comes when dropping the table. I don't understand what you mean by the data and metadata is deleted in internal and only metadata is deleted in external tables. Can anyone explain me in terms of nodes please.
相关问题
- Spark on Yarn Container Failure
- enableHiveSupport throws error in java spark code
- spark select and add columns with alias
- Unable to generate jar file for Hadoop
-
hive: cast array
> into map
相关文章
- 在hive sql里怎么把"2020-10-26T08:41:19.000Z"这个字符串转换成年月日
- Java写文件至HDFS失败
- mapreduce count example
- SQL query Frequency Distribution matrix for produc
- Cloudera 5.6: Parquet does not support date. See H
- Could you give me any clue Why 'Cannot call me
- converting to timestamp with time zone failed on A
- Hive error: parseexception missing EOF
hive stores only the meta data in metastore and original data in out side of hive when we use external table we can give location' ' by these our original data wont effect when we drop the table
Internal tables are useful if you want Hive to manage the complete lifecycle of your data including the deletion, whereas external tables are useful when the files are being used outside of Hive.
INTERNAL : Table is created First and Data is loaded later
EXTERNAL : Data is present and Table is created on top of it.
Hive has a relational database on the master node it uses to keep track of state. For instance, when you
CREATE TABLE FOO(foo string) LOCATION 'hdfs://tmp/';
, this table schema is stored in the database.If you have a partitioned table, the partitions are stored in the database(this allows hive to use lists of partitions without going to the file-system and finding them, etc). These sorts of things are the 'metadata'.
When you drop an internal table, it drops the data, and it also drops the metadata.
When you drop an external table, it only drops the meta data. That means hive is ignorant of that data now. It does not touch the data itself.
Consider this scenario which best suits for External Table:
A MapReduce (MR) job filters a huge log file to spit out
n
sub log files (e.g. each sub log file contains a specific message type log) and the output i.en
sub log files are stored in hdfs.These log files are to be loaded into Hive tables for performing further analytic, in this scenario I would recommend an External Table(s), because the actual log files are generated and owned by an external process i.e. a MR job besides you can avoid an additional step of loading each generated log file into respective Hive table as well.