I want to iterate through a file and put the contents of each line into a deeply nested dict, the structure of which is defined by leading whitespace. This desire is very much like that documented here. I've solved that but now have the problem of handling the case where repeating keys are overwritten instead of being cast into a list.
Essentially:
a:
b: c
d: e
a:
b: c2
d: e2
d: wrench
is cast into {"a":{"b":"c2","d":"wrench"}}
when it should be cast into
{"a":[{"b":"c","d":"e"},{"b":"c2","d":["e2","wrench"]}]}
A self-contained example:
import json
def jsonify_indented_tree(tree):
#convert indentet text into json
parsedJson= {}
parentStack = [parsedJson]
for i, line in enumerate(tree):
data = get_key_value(line)
if data['key'] in parsedJson.keys(): #if parent key is repeated, then cast value as list entry
# stuff that doesn't work
# if isinstance(parsedJson[data['key']],list):
# parsedJson[data['key']].append(parsedJson[data['key']])
# else:
# parsedJson[data['key']]=[parsedJson[data['key']]]
print('Hey - Make a list now!')
if data['value']: #process child by adding it to its current parent
currentParent = parentStack[-1] #.getLastElement()
currentParent[data['key']] = data['value']
if i is not len(tree)-1:
#determine when to switch to next branch
level_dif = data['level']-get_key_value(tree[i+1])['level'] #peek next line level
if (level_dif > 0):
del parentStack[-level_dif:] #reached leaf, process next branch
else:
#group node, push it as the new parent and keep on processing.
currentParent = parentStack[-1] #.getLastElement()
currentParent[data['key']] = {}
newParent = currentParent[data['key']]
parentStack.append(newParent)
return parsedJson
def get_key_value(line):
key = line.split(":")[0].strip()
value = line.split(":")[1].strip()
level = len(line) - len(line.lstrip())
return {'key':key,'value':value,'level':level}
def pp_json(json_thing, sort=True, indents=4):
if type(json_thing) is str:
print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
else:
print(json.dumps(json_thing, sort_keys=sort, indent=indents))
return None
#nested_string=['a:', '\tb:\t\tc', '\td:\t\te', 'a:', '\tb:\t\tc2', '\td:\t\te2']
#nested_string=['w:','\tgeneral:\t\tcase','a:','\tb:\t\tc','\td:\t\te','a:','\tb:\t\tc2','\td:\t\te2']
nested_string=['a:',
'\tb:\t\tc',
'\td:\t\te',
'a:',
'\tb:\t\tc2',
'\td:\t\te2',
'\td:\t\twrench']
pp_json(jsonify_indented_tree(nested_string))
This approach is (logically) a lot more straightforward (though longer):
level
andkey
-value
pair of each line in your multi-line stringlevel
keyed dict of lists: {level1
:[dict1
,dict2
]}level1
:[dict1
,dict2
,"nestKeyA"
]}level1
:[dict1
,dict2
,"nestKeyA"
],level2
:[...]}. The contents of some deeper levellevel2
may itself be just another key-only line (and the next loop will add a new levellevel3
such that it will become {level1
:[dict1
,dict2
,"nestKeyA"
],level2
:["nestKeyB"
],level3
:[...]}) or a new dictdict3
such that {level1
:[dict1
,dict2
,"nestKeyA"
],level2
:[dict3
]Steps 1-4 continue until the current line is indented less than the previous one (signifying a return to some prior scope). This is what the data structure looks like on my example per line iteration.
Then two things need to happen. 1: the list of dict need to be inspected for containing duplicate keys and any of those duplicated dict's values combined in a list - this will be demonstrated in a moment. 2: as can be seen between iteration 4 and 5, the list of dicts from the deepest level (here
1
) are combined into one dict... Finally, to demonstrate duplicate handling observe:where
wrench
ande2
are placed in a list that itself goes into a dict keyed by their original key.Repeat Steps 1-5, hoisting deeper scoped dicts up and onto their parent keys until the current line's scope (level) is reached.
Here's the code: