Issue when executing a show create table
and then executing the resulting create table
statement if the table is ORC.
Using show create table
, you get this:
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’
But if you create the table with those clauses, you will then get the casting error when selecting. Error likes:
Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.BinaryComparable
To fix this, just change create table
statement to STORED AS ORC
But, as the answer said in the similar question:
What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive? .
I can't figure out the reason.
STORED AS
implies 3 things:You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde
Demo
hive.default.serde
STORED AS ORC
Note that the SERDE is
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT ... OUTPUTFORMAT ...
Note that the SERDE is
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
You сan specify
INPUTFORMAT
,OUTPUTFORMAT
,SERDE
inSTORED AS
when creating table. Hive allows you to separate your record format from your file format. You can provide custom classes forINPUTFORMAT
,OUTPUTFORMAT
,SERDE
. See details: http://www.dummies.com/programming/big-data/hadoop/defining-table-record-formats-in-hive/Alternatively you can write simply
STORED AS ORC
orSTORED AS TEXTFILE
for example. STORED AS ORC statement already takes care aboutINPUTFORMAT
,OUTPUTFORMAT
andSERDE
. This allows you not to write those long fully qualified Java class names forINPUTFORMAT
,OUTPUTFORMAT
,SERDE
. JustSTORED AS ORC
instead.