Importing data from HDFS to Hive table

2019-04-30 05:29发布

问题:

I have my data in data/2011/01/13/0100/file in HDFS, each of thes file contain data in tab separated, say name, ip , url.

I want to create a table in Hive and import the data from hdfs, table should contain time,name, ip and url.

How can I import these using Hive ? r the data should be in some other format so that I can import the time as well ?

回答1:

To do this you have to use partitions, read more about them here:

  • http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Add_Partitions
  • partition column in hive


回答2:

You need to create the table to load the files into and then use the LOAD DATA command to load the files into the Hive tables. See the Hive documentation for the precise syntax to use.

Regards, Jeff



回答3:

You can create an external table for such data.

Something like:

CREATE EXTERNAL TABLE log_data (name STRING, ip STRING, url STRING) PARTITIONED BY (year BIGINT, month BIGINT, day BIGINT, hour BIGINT) row format delimited fields terminated by '\t' stored as TEXTFILE location 'data'



标签: hadoop hdfs hive