Difference between Hive internal tables and extern

2020-01-23 14:58发布

Can anyone tell me the difference between Hive's external table and internal tables. I know the difference comes when dropping the table. I don't understand what you mean by the data and metadata is deleted in internal and only metadata is deleted in external tables. Can anyone explain me in terms of nodes please.

17条回答
男人必须洒脱
2楼-- · 2020-01-23 15:32

In simple words, there are two things:

Hive can manage things in warehouse i.e. it will not delete data out of warehouse. When we delete table:

1) For internal tables the data is managed internally in warehouse. So will be deleted.

2) For external tables the data is managed eternal from warehouse. So can't be deleted and clients other then hive can also use it.

查看更多
Juvenile、少年°
3楼-- · 2020-01-23 15:33

When there is data already in HDFS, an external Hive table can be created to describe the data. It is called EXTERNAL because the data in the external table is specified in the LOCATION properties instead of the default warehouse directory.

When keeping data in the internal tables, Hive fully manages the life cycle of the table and data. This means the data is removed once the internal table is dropped. If the external table is dropped, the table metadata is deleted but the data is kept. Most of the time, an external table is preferred to avoid deleting data along with tables by mistake.

查看更多
一夜七次
4楼-- · 2020-01-23 15:35

An internal table data is stored in the warehouse folder, whereas an external table data is stored at the location you mentioned in table creation.

So when you delete an internal table, it deletes the schema as well as the data under the warehouse folder, but for an external table it's only the schema that you will loose.

So when you want an external table back you again after deleting it, can create a table with the same schema again and point it to the original data location. Hope it is clear now.

查看更多
男人必须洒脱
5楼-- · 2020-01-23 15:37

To answer you Question :

For External Tables ,Hive does not move the data into its warehouse directory. If the external table is dropped, then the table metadata is deleted but not the data.

For Internal tables , Hive moves data into its warehouse directory. If the table is dropped, then the table metadata and the data will be deleted.


For your reference,

Difference between Internal & External tables :

For External Tables -

  • External table stores files on the HDFS server but tables are not linked to the source file completely.

  • If you delete an external table the file still remains on the HDFS server.

    As an example if you create an external table called “table_test” in HIVE using HIVE-QL and link the table to file “file”, then deleting “table_test” from HIVE will not delete “file” from HDFS.

  • External table files are accessible to anyone who has access to HDFS file structure and therefore security needs to be managed at the HDFS file/folder level.

  • Meta data is maintained on master node, and deleting an external table from HIVE only deletes the metadata not the data/file.


For Internal Tables-

  • Stored in a directory based on settings in hive.metastore.warehouse.dir, by default internal tables are stored in the following directory “/user/hive/warehouse” you can change it by updating the location in the config file .
  • Deleting the table deletes the metadata and data from master-node and HDFS respectively.
  • Internal table file security is controlled solely via HIVE. Security needs to be managed within HIVE, probably at the schema level (depends on organization).

Hive may have internal or external tables, this is a choice that affects how data is loaded, controlled, and managed.

Use EXTERNAL tables when:

  • The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn’t lock the files.
  • Data needs to remain in the underlying location even after a DROP TABLE. This can apply if you are pointing multiple schema (tables or views) at a single data set or if you are iterating through various possible schema.
  • Hive should not own data and control settings, directories, etc., you may have another program or process that will do those things.
  • You are not creating table based on existing table (AS SELECT).

Use INTERNAL tables when:

  • The data is temporary.
  • You want Hive to completely manage the life-cycle of the table and data.

Source :

HDInsight: Hive Internal and External Tables Intro

Internal & external tables in Hadoop- HIVE

查看更多
乱世女痞
6楼-- · 2020-01-23 15:37

In Hive We can also create an external table. It tells Hive to refer to the data that is at an existing location outside the warehouse directory. Dropping External tables will delete metadata but not the data.

查看更多
时光不老,我们不散
7楼-- · 2020-01-23 15:38

In external tables, if you drop it, it deletes only schema of the table, table data exists in physical location. So to deleted the data use hadoop fs - rmr tablename . Managed table hive will have full control on tables. In external tables users will have control on it.

查看更多
登录 后发表回答