I recently had to solve a problem in a real data system with a nested dict/list combination. I worked on this for quite a while and came up with a solution, but I am very unsatisfied. I had to resort to using globals()
and a named temporary global parameter.
I do not like to use globals. That's just asking for an injection vulnerability. I feel that there must be a better way to perform this task without resorting to globals.
Problem Dataset:
d = {
"k":1,
"stuff":"s1",
"l":{"m":[
{
"k":2,
"stuff":"s2",
"l":None
},
{
"k":3,
"stuff":"s3",
"l":{"m":[
{
"k":4,
"stuff":"s4",
"l":None
},
{
"k":5,
"stuff":"s5",
"l":{"m":[
{
"k":6,
"stuff":"s6",
"l":None
},
]}
},
]}
},
]}
}
Desired Output:
[{'k': 1, 'stuff': 's1'},
{'k': 2, 'stuff': 's2'},
{'k': 3, 'stuff': 's3'},
{'k': 4, 'stuff': 's4'},
{'k': 5, 'stuff': 's5'},
{'k': 6, 'stuff': 's6'}]
My Solution:
def _get_recursive_results(d, iter_key, get_keys):
if not 'h' in globals():
global h
h = []
h.append({k:d.get(k) for k in get_keys})
d2 = d.copy()
for k in iter_key:
if not d2:
continue
d2 = d2.get(k)
for td in d2:
d3 = td.copy()
for k in iter_key:
if not d3:
continue
d3 = d3.get(k)
if d3:
return _get_recursive_results(td, iter_key, get_keys)
h.append({k:td.get(k) for k in get_keys})
else:
l = [k for k in h]
del globals()['h']
return l
Calling my function as follows returns the desired result:
_get_recursively(d, ['l','m'], ['k','stuff'])
How would I build a better solution?
Take a look at https://github.com/akesterson/dpath-python/blob/master/README.rst
It a nice way of searching over a dict
This is a slightly modified version without using globals. Set
h
toNone
as default and create a new list for the first call to_get_recursive_results()
. Later provideh
as an argument in the recursive calls to_get_recursive_results()
:Now:
There is no need for the copying of intermediate dicts. This is a further modified version without copying:
Use generator
With following generator:
you get the results by:
Python 3.3 provides new
yield from
expression used to delegate yielding to subgenerator. Using this expression, the code can be one line shorter:Some methods to avoid
globals
generators
Often, if you need to build up a list and search for replacing global variables, generators might come handy as they keep status of current work in its local variables plus building up the whole result is postponed to consuming generated values.
recursion
Recursion store the subresults in local variables in stack.
class instance with internal property
A class can serve as a tin to encapsulate your variables.
Instead of using global variable, you store intermediate result in instance property.
Generalize for different data structures
In your comments you mentioned, that you receive many different types with each dump.
I will assume, that your data fulfill following expectations:
{"k": xx, "stuff": yy}
)One option to make the solution more general is to provide list of keys to use to access the value/subitems, another option is to provide a function, which does the work of getting the node value and subitems.
Here I use
get_value
to deliver node value andget_subitems
to deliver subnodes:The processing is then done by:
called in this way:
Advantage of using functions is that it is much more flexible for whatever data structures you would have to process (adapting to other data structures would require only providing customized version of function
get_value
andget_subitems
- having the same or different names according to your preferences.Edit: First version had a bug which is now corrected
I believe this should work, we're using the power of recursion!
I verified it works. Please check it. Of course, it should be modified when you change the structure of dictionary-list.
This is not as generic but it does the job: