I am trying to read csv data from s3 bucket and creating a table in AWS Athena. My table when created was unable to skip the header information of my CSV file.
Query Example :
CREATE EXTERNAL TABLE IF NOT EXISTS table_name ( `event_type_id`
string, `customer_id` string, `date` string, `email` string )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH
SERDEPROPERTIES ( "separatorChar" = "|", "quoteChar" = "\"" )
LOCATION 's3://location/'
TBLPROPERTIES ("skip.header.line.count"="1");
skip.header.line.count doesn't seem to work. But this does not work out. I think Aws has some issue with this.Is there any other way that I could get through this?
This is a known deficiency.
The best method I've seen was tweeted by Eric Hammond:
This appears to skip header lines during a Query. I'm not sure how it works, but it might be a method for skipping NULLs.
This is what works in Redshift:
You want to use
table properties ('skip.header.line.count'='1')
Along with other properties if you want, e.g.'numRows'='100'
. Here's a sample: