I store the following JSON objects in a Hive table:
{
"main_id": "qwert",
"features": [
{
"scope": "scope1",
"name": "foo",
"value": "ab12345",
"age": 50,
"somelist": ["abcde","fghij"]
},
{
"scope": "scope2",
"name": "bar",
"value": "cd67890"
},
{
"scope": "scope3",
"name": "baz",
"value": [
"A",
"B",
"C"
]
}
]
}
"features" is an array of varying length, i.e. all objects are optional. The objects have arbitrary elements, but all of them contain "scope", "name" and "value".
This is the Hive table I created:
CREATE TABLE tbl(
main_id STRING,features array<struct<scope:STRING,name:STRING,value:array<STRING>,age:INT,somelist:array<STRING>>>
)
I need a Hive query that returns the main_id and the value of the struct with the name "baz", i.e.,
main_id baz_value
qwert ["A","B","C"]
My problem is that the Hive UDF "get_json_object" supports only a limited version of JSONPath. It does not support a path like get_json_object(features, '$.features[?(@.name='baz')]')
.
How can query the wanted result with Hive? Is it maybe easier with another Hive table structure?
The UDF pasted below I think is close to your needs. It takes
array<struct>
, a string, and an integer. String is the field name, in your case "name", and the third argument is the value to match on. Currently it expects an integer but it should be relatively easy to change this to string / text for your purpose.Here is the code for this and a couple other working udfs that do things with
array<struct>
.This one could be ok.
ParseJsonWithPath
The field to be parsed can be a json string(jsonStr), given the xpath, you can get what you want.
For example
more detail
I found a solution for this:
Use the Hive explode UDTF to explode the struct array, i.e., create a second (temporary) table with one record for each struct in the array "features".
The result of this is:
Now you can select the main_id and value like this: