How to convert nested JSON to CSV using only jq

2020-03-07 06:44发布

问题:

I've following json,

{
    "A": {
        "C": {
            "D": "T1",
            "E": 1
        },
        "F": {
            "D": "T2",
            "E": 2
        }
    },
    "B": {
        "C": {
            "D": "T3",
            "E": 3
        }
    }
}

I want to convert it into csv as follows,

A,C,T1,1
A,F,T2,2
B,C,T3,3

Description of output: The parents keys will be printed until, I've reached the leaf child. Once I reached leaf child, print its value.

I've tried following and couldn't succeed,

cat my.json | jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $rows[] | @csv'

and it throwing me an error.

I can't hardcode the parent keys, as the actual json has too many records. But the structure of the json is similar. What am I missing?

回答1:

Some of the requirements are unclear, but the following solves one interpretation of the problem:

paths as $path
| {path: $path, value: getpath($path)}
| select(.value|type == "object" )
| select( [.value[]][0] | type != "object")
| .path + ([.value[]])
| @csv

(This program could be optimized but the presentation here is intended to make the separate steps clear.)

Invocation:

jq -r -f leaves-to-csv.jq input.json

Output:

"A","C","T1",1
"A","F","T2",2
"B","C","T3",3

Unquoted strings

To avoid the quotation marks around strings, you could replace the last component of the pipeline above with:

join(",")


回答2:

Here is a solution using tostream and group_by

    [
        tostream
      | select(length == 2)            # e.g. [["A","C","D"],"T1"]
      | .[0][:-1] + [.[1]]             #      ["A","C","T1"]
    ]
    | group_by(.[:-1])                 #    [[["A","C","T1"],["A","C",1]],...
    | .[]                              #     [["A","C","T1"],["A","C",1]]
    | .[0][0:2] + map(.[-1]|tostring)  #      ["A","C","T1","1"]
    | join(",")                        #       "A,C,T1,1"


标签: json csv nested jq