I am trying to load the json file to pandas data frame. I found that there were some nested json. Below is the sample json:
{'events': [{'id': 142896214,
'playerId': 37831,
'teamId': 3157,
'matchId': 2214569,
'matchPeriod': '1H',
'eventSec': 0.8935539999999946,
'eventId': 8,
'eventName': 'Pass',
'subEventId': 85,
'subEventName': 'Simple pass',
'positions': [{'x': 51, 'y': 49}, {'x': 40, 'y': 53}],
'tags': [{'id': 1801, 'tag': {'label': 'accurate'}}]}
I used the following code to load json into dataframe:
with open('EVENTS.json') as f:
jsonstr = json.load(f)
df = pd.io.json.json_normalize(jsonstr['events'])
Below is the output of df.head()
But I found two nested columns such as positions and tags.
I tried using the following code to flatten it:
Position_data = json_normalize(data =jsonstr['events'], record_path='positions', meta = ['x','y','x','y'] )
It showed me an error as follow:
KeyError: "Try running with errors='ignore' as key 'x' is not always present"
Can you advise me how to flatten positions and tags ( those having nested data).
Thanks, Zep
If you are looking for a more general way to unfold multiple hierarchies from a json you can use
recursion
and list comprehension to reshape your data. One alternative is presented below:Then you can apply to your data, independent of nested levels:
New sample data
Usage
Note that this
flatten_json
code is not mine, I have seen it here and here without much certainty of the original source.Create the DataFrame
Flatten
positions
withpd.Series
Rename
positions[0]
&positions[1]
:Flatten
tags
withpd.Series
:Rename
id
&label
:Combine them all with
pd.concat
:Drop the old columns: