I'm having trouble selecting into an ARRAY of STRUCTS in Hive.
My source table looks like this:
+-------------+--+
| field |
+-------------+--+
| id |
| fieldid |
| fieldlabel |
| fieldtype |
| answer_id |
| unitname |
+-------------+--+
This is survey data, where the id is the survey id, the four fields in the middle are response data, and the unitname is the the business unit that the survey pertains to.
I need to create an array of structs for all of the answers for each survey id. I thought this would work, but it doesn't:
select id,
array(
named_struct(
"field_id",
fieldid,
"field_label",
fieldlabel,
"field_type",
fieldtype,
"answer_id",
answer_id,)) as answers,
unitname
from new_answers;
What that returns is each survey answer (field_id) as an array of one struct for that answer, like this:
id | answers | unitname
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069}] | Location2
2 | [{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070}] | Location2
2 | [{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2
But what I need to get to is this:
id | answers | unitname
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},
{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},
{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2
I've searched and searched, but all the answers I'm finding seem to have to do with using INSERT INTO....VALUES() queries. I have a table structure already; I just can't get the ARRAY to ARRAY up the way it should.
Any help would be most appreciated.
For reproduction purposes, if need be:
CREATE TABLE `new_answers`(
`id` bigint,
`fieldid` bigint,
`fieldlabel` string,
`fieldtype` string,
`answer_id` bigint,
`unitname` string)
INSERT INTO new_answers VALUES
(1,175877,"Comment","COMMENT",8990947803,"Location1"),
(2,47824,"Language","MULTIPLE_CHOICE",8990950069,"Location2"),
(2,48187,"Language Type","MULTIPLE_CHOICE",8990950070,"Location2"),
(2,47829,"Trans #","TEXT",8990950071,"Location2");
The functionality that you seem to be looking for is to collect the structs into an array. Hive comes with two functions for collecting things into arrays: collect_set and collect_list. However, those functions only work to create arrays of basic types.
The jar for the brickhouse project (https://github.com/klout/brickhouse/wiki/Downloads) provides a number of features, including the ability to collect complex types.
Then you can add the
collect
function using whatever name you like:The following query:
Provides the following result: