Can anyone tell me the difference between Hive's external table and internal tables. I know the difference comes when dropping the table. I don't understand what you mean by the data and metadata is deleted in internal and only metadata is deleted in external tables. Can anyone explain me in terms of nodes please.
- Spark on Yarn Container Failure
- enableHiveSupport throws error in java spark code
- spark select and add columns with alias
- Unable to generate jar file for Hadoop
hive: cast array
> into map
- 在hive sql里怎么把"2020-10-26T08:41:19.000Z"这个字符串转换成年月日
- Java写文件至HDFS失败
- mapreduce count example
- SQL query Frequency Distribution matrix for produc
- Cloudera 5.6: Parquet does not support date. See H
- Could you give me any clue Why 'Cannot call me
- converting to timestamp with time zone failed on A
- Hive error: parseexception missing EOF
In simple words, there are two things:
Hive can manage things in warehouse i.e. it will not delete data out of warehouse. When we delete table:
1) For internal tables the data is managed internally in warehouse. So will be deleted.
2) For external tables the data is managed eternal from warehouse. So can't be deleted and clients other then hive can also use it.
When there is data already in HDFS, an external Hive table can be created to describe the data. It is called EXTERNAL because the data in the external table is specified in the LOCATION properties instead of the default warehouse directory.
When keeping data in the internal tables, Hive fully manages the life cycle of the table and data. This means the data is removed once the internal table is dropped. If the external table is dropped, the table metadata is deleted but the data is kept. Most of the time, an external table is preferred to avoid deleting data along with tables by mistake.
An internal table data is stored in the warehouse folder, whereas an external table data is stored at the location you mentioned in table creation.
So when you delete an internal table, it deletes the schema as well as the data under the warehouse folder, but for an external table it's only the schema that you will loose.
So when you want an external table back you again after deleting it, can create a table with the same schema again and point it to the original data location. Hope it is clear now.
To answer you Question :
For External Tables ,Hive does not move the data into its warehouse directory. If the external table is dropped, then the table metadata is deleted but not the data.
For Internal tables , Hive moves data into its warehouse directory. If the table is dropped, then the table metadata and the data will be deleted.
For your reference,
Difference between Internal & External tables :
Hive may have internal or external tables, this is a choice that affects how data is loaded, controlled, and managed.
Use EXTERNAL tables when:
Use INTERNAL tables when:
Source :
HDInsight: Hive Internal and External Tables Intro
Internal & external tables in Hadoop- HIVE
In Hive We can also create an external table. It tells Hive to refer to the data that is at an existing location outside the warehouse directory. Dropping External tables will delete metadata but not the data.
In external tables, if you drop it, it deletes only schema of the table, table data exists in physical location. So to deleted the data use hadoop fs - rmr tablename . Managed table hive will have full control on tables. In external tables users will have control on it.