I am using Cloudera's version of Hive and trying to create an external table over a csv file that contains the column names in the first column. Here is the code that I am using to do that.
CREATE EXTERNAL TABLE Test (
RecordId int,
FirstName string,
LastName string
)
ROW FORMAT serde 'com.bizo.hive.serde.csv.CSVSerde'
WITH SerDeProperties (
"separatorChar" = ","
)
STORED AS TEXTFILE
LOCATION '/user/File.csv'
Sample Data
RecordId,FirstName,LastName
1,"John","Doe"
2,"Jane","Doe"
Can anyone help me with how to skip the first row or do I need to add an intermediate step?
I also struggled with this and found no way to tell hive to skip first row, like there is e.g. in Greenplum. So finally I had to remove it from the files. e.g. "cat File.csv | grep -v RecordId > File_no_header.csv"