I know this question has been asked many times. I tried several solutions but I couldn't solve my problem.
I have a large nested JSON file (1.4GB) and I would like to make it flat and then convert it to a CSV file.
The JSON structure is like this:
{
"company_number": "12345678",
"data": {
"address": {
"address_line_1": "Address 1",
"locality": "Henley-On-Thames",
"postal_code": "RG9 1DP",
"premises": "161",
"region": "Oxfordshire"
},
"country_of_residence": "England",
"date_of_birth": {
"month": 2,
"year": 1977
},
"etag": "26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00",
"kind": "individual-person-with-significant-control",
"links": {
"self": "/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl"
},
"name": "John M Smith",
"name_elements": {
"forename": "John",
"middle_name": "M",
"surname": "Smith",
"title": "Mrs"
},
"nationality": "Vietnamese",
"natures_of_control": [
"ownership-of-shares-50-to-75-percent"
],
"notified_on": "2016-04-06"
}
}
I know that this is easy to accomplish with pandas
module but I am not familiar with it.
EDITED
The desired output should be something like this:
company_number, address_line_1, locality, country_of_residence, kind,
12345678, Address 1, Henley-On-Thamed, England, individual-person-with-significant-control
Note that this is just the short version. The output should have all the fields.
For the JSON data you have given, you could do this by parsing the JSON structure to just return a list of all the leaf nodes.
This assumes that your structure is consistent throughout, if each entry can have different fields, see the second approach.
For example:
If your JSON data is a list of entries in the format you have given, then you should get output as follows:
If each entry can contain different (or possibly missing) fields, then a better approach would be to use a
DictWriter
. In this case, all of the entries would need to be processed to determine the complete list of possiblefieldnames
so that the correct header can be written.