Hive External table-CSV File- Header row

2020-02-17 07:29发布

Below is the hive table i have created:

CREATE EXTERNAL TABLE Activity (
  column1 type, </br>
  column2 type
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/exttable/';

In my HDFS location /exttable, i have lot of CSV files and each CSV file also contain the header row. When i am doing select queries, the result contains the header row as well.

Is there any way in HIVE where we can ignore the header row or first line ?

标签: hive
6条回答
萌系小妹纸
2楼-- · 2020-02-17 07:51

If you are using Hive version 0.13.0 or higher you can specify "skip.header.line.count"="1" in your table properties to remove the header.

For detailed information on the patch see: https://issues.apache.org/jira/browse/HIVE-5795

查看更多
Deceive 欺骗
3楼-- · 2020-02-17 08:09

There is not. However, you can pre-process your files to skip the first row before loading into HDFS -

tail -n +2 withfirstrow.csv > withoutfirstrow.csv

Alternatively, you can build it into where clause in HIVE to ignore the first row.

查看更多
叛逆
4楼-- · 2020-02-17 08:10

Lets say you want to load csv file like below located at /home/test/que.csv

1,TAP (PORTUGAL),AIRLINE
2,ANSA INTERNATIONAL,AUTO RENTAL
3,CARLTON HOTELS,HOTEL-MOTEL

Now, we need to create a location in HDFS that holds this data.

hadoop fs -put /home/test/que.csv /user/mcc

Next step is to create a table. There are two types of them to choose from. Refer this for choosing one.

Example for External Table.

create external table industry_ 
(
MCC string ,
MCC_Name string,
MCC_Group string
)       
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/mcc/'
tblproperties ("skip.header.line.count"="1");
查看更多
我想做一个坏孩纸
5楼-- · 2020-02-17 08:13

If your hive version doesn't support tblproperties ("skip.header.line.count"="1"), you can use below unix command to ignore the first line (column header) and then put it in HDFS.

 sed -n '2,$p' File_with_header.csv > File_with_No_header.csv
查看更多
不美不萌又怎样
6楼-- · 2020-02-17 08:14

you can now skip the header count in hive 0.13.0.


tblproperties ("skip.header.line.count"="1");

查看更多
Rolldiameter
7楼-- · 2020-02-17 08:14

To remove the header from the csv file in place use:

sed -i 1d filename.csv 
查看更多
登录 后发表回答