I have my data in data/2011/01/13/0100/file in HDFS, each of thes file contain data in tab separated, say name, ip , url.
I want to create a table in Hive and import the data from hdfs, table should contain time,name, ip and url.
How can I import these using Hive ? r the data should be in some other format so that I can import the time as well ?
To do this you have to use partitions, read more about them here:
- http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Add_Partitions
- partition column in hive
You need to create the table to load the files into and then use the LOAD DATA command to load the files into the Hive tables. See the Hive documentation for the precise syntax to use.
Regards,
Jeff
You can create an external table for such data.
Something like:
CREATE EXTERNAL TABLE log_data (name STRING, ip STRING, url STRING)
PARTITIONED BY (year BIGINT, month BIGINT, day BIGINT, hour BIGINT)
row format delimited fields terminated by '\t' stored as TEXTFILE
location 'data'