Loading Linkedin JSON response into HIVE

2019-06-04 18:55发布

问题:

EDIT: Changed the HQL statement to map to the JSON structure. But the error persists

I have tried multiple ways to create the HIVE table and retrieve data using JSONSerDe. But here are the errors I encounter:

hive> select * from jobs;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: j 
ava.io.EOFException: No content to map to Object due to end of input

hive> select values from jobs;

Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing writable
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java :1408)

Here is the table creation statement:

create external table jobs (
 jobs STRUCT<
   values : ARRAY<STRUCT<
   id : STRING,
   customerJobCode : STRING,
   postingDate : STRING,
   expirationDate : STRING,
 company : STRUCT<
   id : STRING,
   name : STRING>,
 position : STRUCT<
   title : STRING,
   jobFunctions : STRING,
   industries : STRING,
   jobType : STRING,
   experienceLevel : STRING>,
 skillsAndExperience : ARRAY<STRING>,
 descriptionSnippet : ARRAY<STRING>,
 salary : STRING,
 jobPoster : STRUCT<
  id : STRING,
  firstName : STRING,
  lastName : STRING,
  headline : STRING>,
 referralBonus : STRING,
 locationDescription : STRING>>>
 )
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/sunita/tables/jobs';

The raw input file is - https://gist.github.com/anonymous/e2c15d808bbe46b707bf/raw/88d775cb418901807980c52e803ffc8be53adc5f/jobsearch.json

I tried not adding 'values' (an array of structure) to the table description Also tried without the 'values' in input file as well as table creation statement. There are no errors with this approach but as one can anticipate, only 1 entry gets into the table and everything else goes as null. Hive considers it as a single record which causes this issue.

I tried simplifying the input to select lesser fields, but still get the same error on retrieving the information. Any help in this regard is truly appreciated.

Also ensured that the JSON string is valid using the Notepad ++ JSON plugin. Any help is truly appreaciated.

回答1:

The problem was a newline at the end of the input file. Making sure that I elimiated any characters at the end of the data resolved the issue.