Where is the syntax error on this simple Hive quer

Let's import a simple table in Hive:

hive> CREATE EXTERNAL TABLE tweets (id BIGINT, id_str STRING, user STRUCT<id:BIGINT, screen_name:STRING>)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
LOCATION '/projets/tweets';

OK
Time taken: 2.253 seconds

hive> describe tweets.user;

OK
id                      bigint                  from deserializer
screen_name             string                  from deserializer
Time taken: 1.151 seconds, Fetched: 2 row(s)

I cannot figure out where is the syntax error here:

hive> select user.id from tweets limit 5;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating user.id
Time taken: 0.699 seconds

I am using the version 1.2.1 of Hive.

标签： hadoop twitter hive hql

1条回答

劫难

2楼-- · 2019-01-29 11:09

I finally found the answer. It seems it is a problem with the JAR used to serialize/deserialize the JSON. The default one (Apache) is not able to perform a good job on the data I have.

I tried all these typical JAR (in parenthesis, the class for 'ROW FORMAT SERDE'):

hive-json-serde-0.2.jar (org.apache.hadoop.hive.contrib.serde2.JsonSerde)
hive-serdes-1.0-SNAPSHOT.jar (com.cloudera.hive.serde.JSONSerDe)
hive-serde-1.2.1.jar (org.apache.hadoop.hive.serde2.DelimitedJSONSerDe)
hive-serde-1.2.1.jar (org.apache.hadoop.hive.serde2.avro.AvroSerDe)

All of them gave me different kinds of errors. I list them there so the next guy can Google them:

Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating user.id
java.lang.ClassCastException: org.json.JSONObject cannot be cast to [Ljava.lang.Object;
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long Failed with exception
java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: DelimitedJSONSerDe cannot deserialize.
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Expecting a AvroGenericRecordWritable

Finally, the working JAR is json-serde-1.3-jar-with-dependencies.jar which can be found here. This one is working with 'STRUCT' and can even ignore some malformed JSON. I have also to use for the creation of the table this class:

 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
 LOCATION ...

If needed, it is possible to recompile it from here or here. I tried the first repository and it is compiling fine for me, after adding the necessary libs. The repository has also been updated recently.

0人赞添加讨论(0) 举报

Where is the syntax error on this simple Hive quer

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间