I have a JSON where the order of fields is not fixed.
i.e. I can have [A, B, C] or [B, C, A]
All A, B, C are json objects are of the form {Name: x, Value:y}.
So, when I use USQL to extract the JSON (I don't know their order) and put it into a CSV (for which I will need column name):
@output =
SELECT
A["Value"] ?? "0" AS CAST ### (("System_" + A["Name"]) AS STRING),
B["Value"] ?? "0" AS "System_" + B["Name"],
System_da
So, I am trying to put column name as the "Name" field in the JSON.
But am getting the error at #### above:
Message
syntax error. Expected one of: FROM ',' EXCEPT GROUP HAVING INTERSECT OPTION ORDER OUTER UNION UNION WHERE ';' ')'
Resolution
Correct the script syntax, using expected token(s) as a guide.
Description
Invalid syntax found in the script.
Details
at token '(', line 74
near the ###:
**************
I am not allowed to put the correct column name "dynamically" and it is an absolute necessity of my issue.
Input: [A, B, C,], [C, B, A]
Output: A.name B.name C.name
Row 1's values
Row 2's values
This
@output =
SELECT
A["Value"] ?? "0" AS CAST ### (("System_" + A["Name"]) AS STRING),
B["Value"] ?? "0" AS "System_" + B["Name"],
System_da
is not a valid SELECT clause (neither in U-SQL nor any other SQL dialect I am aware of).
What is the JSON Array? Is it a key/value pair? Or positional? Or a single value in the array that you want to have a marker for whether it is present in the array?
From your example, it seems that you want something like:
Input:
[["A","B","C"],["C","D","B"]]
Output:
A B C D
true true true false
false true true true
If that is the case, I would write it as:
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
@input =
SELECT "[[\"A\", \"B\", \"C\"],[\"C\", \"D\", \"B\"]]" AS json
FROM (VALUES (1)) AS T(x);
@data =
SELECT JsonFunctions.JsonTuple(arrstring) AS a
FROM @input CROSS APPLY EXPLODE( JsonFunctions.JsonTuple(json).Values) AS T(arrstring);
@data =
SELECT a.Contains("A") AS A, a.Contains("B") AS B, a.Contains("C") AS C, a.Contains("D") AS D
FROM (SELECT a.Values AS a FROM @data) AS t;
OUTPUT @data
TO "/output/data.csv"
USING Outputters.Csv(outputHeader : true);
If you need something more dynamic, either use the resulting SqlArray
or SqlMap
or use the above approach to generate the script.
However, I wonder why you would model your information this way in the first place. I would recommend finding a more appropriate way to mark the presence of the value in the JSON.
UPDATE: I missed your comment about that the inner array members are an object with two key-value pairs, where one is always called name (for property) and one is always called value for the property value. So here is the answer for that case.
First: Modelling key value pairs in JSON using {"Name": "propname", "Value" : "value"}
is a complete misuse of the flexible modelling capabilities of JSON and should not be done. Use {"propname" : "value"}
instead if you can.
So changing the input, the following will give you the pivoted values. Note that you will need to know the values ahead of time and there are several options on how to do the pivot. I do it in the statement where I create the new SqlMap
instance to reduce the over-modelling, and then in the next SELECT
where I get the values from the map.
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
@input =
SELECT "[[{\"Name\":\"A\", \"Value\": 1}, {\"Name\": \"B\", \"Value\": 2}, {\"Name\": \"C\", \"Value\":3 }], [{\"Name\":\"C\", \"Value\": 4}, {\"Name\":\"D\", \"Value\": 5}, {\"Name\":\"B\", \"Value\": 6}]]" AS json
FROM (VALUES (1)) AS T(x);
@data =
SELECT JsonFunctions.JsonTuple(arrstring) AS a
FROM @input CROSS APPLY EXPLODE( JsonFunctions.JsonTuple(json)) AS T(rowid, arrstring);
@data =
SELECT new SqlMap<string, string>(
a.Values.Select((kvp) =>
new KeyValuePair<string, string>(
JsonFunctions.JsonTuple(kvp)["Name"]
, JsonFunctions.JsonTuple(kvp)["Value"])
)) AS kvp
FROM @data;
@data =
SELECT kvp["A"] AS A,
kvp["B"] AS B,
kvp["C"] AS C,
kvp["D"] AS D
FROM @data;
OUTPUT @data
TO "/output/data.csv"
USING Outputters.Csv(outputHeader : true);