Let's import a simple table in Hive:
hive> CREATE EXTERNAL TABLE tweets (id BIGINT, id_str STRING, user STRUCT<id:BIGINT, screen_name:STRING>)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
LOCATION '/projets/tweets';
OK
Time taken: 2.253 seconds
hive> describe tweets.user;
OK
id bigint from deserializer
screen_name string from deserializer
Time taken: 1.151 seconds, Fetched: 2 row(s)
I cannot figure out where is the syntax error here:
hive> select user.id from tweets limit 5;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating user.id
Time taken: 0.699 seconds
I am using the version 1.2.1 of Hive.
I finally found the answer. It seems it is a problem with the JAR used to serialize/deserialize the JSON. The default one (Apache) is not able to perform a good job on the data I have.
I tried all these typical JAR (in parenthesis, the class for 'ROW FORMAT SERDE'):
All of them gave me different kinds of errors. I list them there so the next guy can Google them:
Finally, the working JAR is json-serde-1.3-jar-with-dependencies.jar which can be found here. This one is working with 'STRUCT' and can even ignore some malformed JSON. I have also to use for the creation of the table this class:
If needed, it is possible to recompile it from here or here. I tried the first repository and it is compiling fine for me, after adding the necessary libs. The repository has also been updated recently.