U-SQL - Extract data from json-array

2019-01-14 14:18发布

问题:

Already tried the suggested JSONPath option, but it seems the JSONExtractor only recognizes root level. In my case I have to deal with a nested json-structure, with an array as well (see example below). Any options for extracting this without multiple intermediate files?

"relation": {
"relationid": "123456",
"name": "relation1",
"addresses": {
    "address": [{
        "addressid": "1",
        "street": "Street 1",
        "postcode": "1234 AB",
        "city": "City 1"
        },
    {
        "addressid": "2",
        "street": "Street 2",
        "postcode": "5678 CD",
        "city": "City 2"
    }]
}}

SELECT relationid, addressid, street, postcode, city ?

回答1:

After fixing your JSON fragment to:

{
    "relation": {
        "relationid": "123456",
        "name": "relation1",
        "addresses": {
            "address": [{
                "addressid": "1",
                "street": "Street 1",
                "postcode": "1234 AB",
                "city": "City 1"
            }, {
                "addressid": "2",
                "street": "Street 2",
                "postcode": "5678 CD",
                "city": "City 2"
            }]
        }
    }
}

and putting it into a file, the following script will get you what you want. Note that you need to navigate down the structure to carry along the higher level items and once you encounter an array, you CROSS APPLY EXPLODE it if you only need those that have the array, or OUTER APPLY EXPLODE it if you want rows with missing arrays.

DECLARE @input string = @"/temp/stackoverflow.json";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

@json = 
EXTRACT relationid int, 
        name string, 
        addresses string
  FROM @input 
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("relation");

@relation =
SELECT relationid,
       name,
       Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(addresses)["address"] AS address_array
FROM @json;

@addresses = 
SELECT relationid, 
       name, 
       Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(address) AS address
FROM @relation
     CROSS APPLY
        EXPLODE (Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(address_array).Values) AS A(address);

@result =
SELECT relationid,
       name,
       address["addressid"]AS addressid,
       address["street"]AS street,
       address["postcode"]AS postcode,
       address["city"]AS city
FROM @addresses;

OUTPUT @result
TO "/users/temp/st_out.csv"
USING Outputters.Csv();