I need to read data from a flat file. It contains a number of lines but want to extract data from the line that looks like:
REVISION 12 30364918 Anarchism 2005-12-06T17:44:47Z RJII 141644
I only want the 2nd, 3rd and 5th entries on this line and put them into a Hive table; I have issued this command but get an error
create external table testTable (
tag string,
a string,
r string
)
row format SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES(
"input.regex" = "REVISION\s,[0-9]*,\s,[0-9]*,\s[a-zA-Z0-9]*\s,[0-9]*-[0-9]*-[0-9]*T[0-9]*:[0-9]*:[0-9]*Z",
"output.format.string" = "%1$s %2$s %3$s")
stored as textfile
location 'hdfs://location:8020/user/bd4-project1/enwiki-20080103-sample';
It doesnt seem to work and keeps giving an exception. Any ideas? The regex could be wrong, but i just have no idea
I can post the exception later, dont have access to the cluster at the moment