U- SQL Unable to extract data from JSON file

2019-01-28 00:28发布

问题:

I was trying to extract data from a JSON file using USQL. Either the query runs successfully without producing any output data or results in "vertex failed fast error".

The JSON file looks like:

{
  "results": [
    {
      "name": "Sales/Account",
      "id": "7367e3f2-e1a5-11e5-80e8-0933ecd4cd8c",
      "deviceName": "HP",
      "deviceModel": "g6-pavilion",
      "clientip": "0.41.4.1"
    },
    {
      "name": "Sales/Account",
      "id": "c01efba0-e0d5-11e5-ae20-af6dc1f2c036",
      "deviceName": "acer",
      "deviceModel": "veriton",
      "clientip": "10.10.14.36"
    }
  ]
}

And my U-SQL script is

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @in string="adl://xyz.azuredatalakestore.net/todelete.json";

DECLARE @out string="adl://xyz.azuredatalakestore.net/todelete.tsv";

@trail2=EXTRACT results string FROM @in USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@jsonify=SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(results,"name","id","deviceName","deviceModel","clientip") AS rec FROM @trail2;

@logSchema=SELECT rec["name"] AS sysName,
              rec["id"] AS sysId,
              rec["deviceName"] AS domainDeviceName,
              rec["deviceModel"] AS domainDeviceModel,
              rec["clientip"] AS domainClientIp 
       FROM @jsonify;

OUTPUT @logSchema TO @out USING Outputters.Tsv();

回答1:

Actually the JSONExtractor supports the rowpath parameter expressed in JSONPath that gives you the ability to identify the JSON object or JSON array items that you want to map into rows. So you can extract your data with a single statement from your JSON document:

@logSchema = 
    EXTRACT name string, id string, deviceName string, deviceModel string, clientip string
    FROM @input
   USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("results[*]");


回答2:

Sarath,

the problem was that your @trail2 output was a json array "[{...},{...}]" which the JsonFunction can't parse, as far as I can tell. So I outputted that to a file and reread it with the inputter, which can parse arrays.

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @in string="adl://xyz.azuredatalakestore.net/todelete.json";
DECLARE @out string="adl://xyz.azuredatalakestore.net/todelete.tsv";
DECLARE @mid string="adl://xyz.azuredatalakestore.net/intermediate.txt";


@trail2=EXTRACT results string FROM @in USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

OUTPUT @trail2 TO @mid USING Outputters.Text(quoting:false);

@jsonify=EXTRACT name string,
                id string, 
                deviceName string ,
                deviceModel string,
                clientip string
FROM @mid USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@logSchema=SELECT name AS sysName,
              id AS sysId,
              deviceName AS domainDeviceName,
              deviceModel AS domainDeviceModel,
              clientip AS domainClientIp 
       FROM @jsonify;

OUTPUT @logSchema TO @out USING Outputters.Tsv();