Hive External table-CSV File- Header row

2020-02-17 07:52发布

问题:

Below is the hive table i have created:

CREATE EXTERNAL TABLE Activity (
  column1 type, </br>
  column2 type
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/exttable/';

In my HDFS location /exttable, i have lot of CSV files and each CSV file also contain the header row. When i am doing select queries, the result contains the header row as well.

Is there any way in HIVE where we can ignore the header row or first line ?

回答1:

If you are using Hive version 0.13.0 or higher you can specify "skip.header.line.count"="1" in your table properties to remove the header.

For detailed information on the patch see: https://issues.apache.org/jira/browse/HIVE-5795



回答2:

you can now skip the header count in hive 0.13.0.


tblproperties ("skip.header.line.count"="1");



回答3:

There is not. However, you can pre-process your files to skip the first row before loading into HDFS -

tail -n +2 withfirstrow.csv > withoutfirstrow.csv

Alternatively, you can build it into where clause in HIVE to ignore the first row.



回答4:

If your hive version doesn't support tblproperties ("skip.header.line.count"="1"), you can use below unix command to ignore the first line (column header) and then put it in HDFS.

 sed -n '2,$p' File_with_header.csv > File_with_No_header.csv


回答5:

Lets say you want to load csv file like below located at /home/test/que.csv

1,TAP (PORTUGAL),AIRLINE
2,ANSA INTERNATIONAL,AUTO RENTAL
3,CARLTON HOTELS,HOTEL-MOTEL

Now, we need to create a location in HDFS that holds this data.

hadoop fs -put /home/test/que.csv /user/mcc

Next step is to create a table. There are two types of them to choose from. Refer this for choosing one.

Example for External Table.

create external table industry_ 
(
MCC string ,
MCC_Name string,
MCC_Group string
)       
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/mcc/'
tblproperties ("skip.header.line.count"="1");


回答6:

To remove the header from the csv file in place use:

sed -i 1d filename.csv 


标签: hive