Python Recursive Search of Dict with Nested Keys

2020-06-17 07:10发布

I recently had to solve a problem in a real data system with a nested dict/list combination. I worked on this for quite a while and came up with a solution, but I am very unsatisfied. I had to resort to using globals() and a named temporary global parameter.

I do not like to use globals. That's just asking for an injection vulnerability. I feel that there must be a better way to perform this task without resorting to globals.

Problem Dataset:

d = {
    "k":1,
    "stuff":"s1",
    "l":{"m":[
        {
            "k":2,
            "stuff":"s2",
            "l":None
        },
        {
            "k":3,
            "stuff":"s3",
            "l":{"m":[
                {
                    "k":4,
                    "stuff":"s4",
                    "l":None
                },
                {
                    "k":5,
                    "stuff":"s5",
                    "l":{"m":[
                        {
                            "k":6,
                            "stuff":"s6",
                            "l":None
                        },
                    ]}
                },
            ]}
        },
    ]}
}

Desired Output:

[{'k': 1, 'stuff': 's1'},
 {'k': 2, 'stuff': 's2'},
 {'k': 3, 'stuff': 's3'},
 {'k': 4, 'stuff': 's4'},
 {'k': 5, 'stuff': 's5'},
 {'k': 6, 'stuff': 's6'}]

My Solution:

def _get_recursive_results(d, iter_key, get_keys):
    if not 'h' in globals():
        global h
        h = []
    h.append({k:d.get(k) for k in get_keys})

    d2 = d.copy()
    for k in iter_key:
        if not d2:
            continue
        d2 = d2.get(k)

    for td in d2:
        d3 = td.copy()
        for k in iter_key:
            if not d3:
                continue
            d3 = d3.get(k)

        if d3:
            return _get_recursive_results(td, iter_key, get_keys)
        h.append({k:td.get(k) for k in get_keys})
    else:
        l = [k for k in h]
        del globals()['h']
        return l

Calling my function as follows returns the desired result:

_get_recursively(d, ['l','m'], ['k','stuff'])

How would I build a better solution?

6条回答
三岁会撩人
2楼-- · 2020-06-17 07:36

Take a look at https://github.com/akesterson/dpath-python/blob/master/README.rst

It a nice way of searching over a dict

查看更多
\"骚年 ilove
3楼-- · 2020-06-17 07:38

This is a slightly modified version without using globals. Set h to None as default and create a new list for the first call to _get_recursive_results(). Later provide h as an argument in the recursive calls to _get_recursive_results():

def _get_recursive_results(d, iter_key, get_keys, h=None):
    if h is None:
        h = []
    h.append({k:d.get(k) for k in get_keys})
    d2 = d.copy()
    for k in iter_key:
        if not d2:
            continue
        d2 = d2.get(k)
    for td in d2:
        d3 = td.copy()
        for k in iter_key:
            if not d3:
                continue
            d3 = d3.get(k)
        if d3:
            return _get_recursive_results(td, iter_key, get_keys, h)
        h.append({k:td.get(k) for k in get_keys})
    else:
        l = [k for k in h]
        return l

Now:

>>> _get_recursive_results(d, ['l','m'], ['k','stuff'])
[{'k': 1, 'stuff': 's1'},
 {'k': 2, 'stuff': 's2'},
 {'k': 3, 'stuff': 's3'},
 {'k': 4, 'stuff': 's4'},
 {'k': 5, 'stuff': 's5'},
 {'k': 6, 'stuff': 's6'}]

There is no need for the copying of intermediate dicts. This is a further modified version without copying:

def _get_recursive_results(d, iter_key, get_keys, h=None):
    if h is None:
        h = []
    h.append({k: d.get(k) for k in get_keys})
    for k in iter_key:
        if not d:
            continue
        d = d.get(k)
    for td in d:
        d3 = td
        for k in iter_key:
            if not d3:
                continue
            d3 = d3.get(k)
        if d3:
            return _get_recursive_results(td, iter_key, get_keys, h)
        h.append({k: td.get(k) for k in get_keys})
    else:
        return h
查看更多
▲ chillily
4楼-- · 2020-06-17 07:45

Use generator

With following generator:

def get_stuff(dct, iter_keys, get_keys):
    k, stuff = get_keys
    l, m = iter_keys
    if k in dct:
        yield {k: dct[k], stuff: dct[stuff]}
        if dct.get(l):
            for subdct in dct[l][m]:
                for res in get_stuff(subdct, iter_keys, get_keys):
                    yield res


list(get_stuff(d, ["l", "m"], ["k", "stuff"]))

you get the results by:

list(get_stuff(d))

Python 3.3 provides new yield from expression used to delegate yielding to subgenerator. Using this expression, the code can be one line shorter:

def get_stuff(dct):
    if "k" in dct:
        yield {"k": dct["k"], "stuff": dct["stuff"]}
        if dct.get("l"):
            for subdct in dct["l"]["m"]:
                yield from get_stuff(subdct)

def get_stuff(dct, iter_keys, get_keys):
    k, stuff = get_keys
    l, m = iter_keys
    if k in dct:
        yield {k: dct[k], stuff: dct[stuff]}
        if dct.get(l):
            for subdct in dct[l][m]:
                yield from get_stuff(subdct, iter_keys, get_keys):

Some methods to avoid globals

generators

Often, if you need to build up a list and search for replacing global variables, generators might come handy as they keep status of current work in its local variables plus building up the whole result is postponed to consuming generated values.

recursion

Recursion store the subresults in local variables in stack.

class instance with internal property

A class can serve as a tin to encapsulate your variables.

Instead of using global variable, you store intermediate result in instance property.

Generalize for different data structures

In your comments you mentioned, that you receive many different types with each dump.

I will assume, that your data fulfill following expectations:

  • has tree-like structure
  • each node in the tree shall contribute to result some result (e.g. a dictionary {"k": xx, "stuff": yy})
  • each node might contain subitems (list of subnodes)

One option to make the solution more general is to provide list of keys to use to access the value/subitems, another option is to provide a function, which does the work of getting the node value and subitems.

Here I use get_value to deliver node value and get_subitems to deliver subnodes:

def get_value(data):
    try:
        return {"k": data["k"], "stuff": data["stuff"]}
    except KeyError:
        return None


def get_subitems(data):
    try:
        return data["l"]["m"]
    except TypeError:
        return None

The processing is then done by:

def get_stuff(dct, get_value_fun, get_subitems_fun):
    value = get_value(dct)
    if value:
        yield value
        lst = get_subitems_fun(dct)
        if lst:
            for subdct in lst:
                for res in get_stuff(subdct, get_value_fun, get_subitems_fun):
                    yield res

called in this way:

get_stuff(d, get_value, get_subitems)

Advantage of using functions is that it is much more flexible for whatever data structures you would have to process (adapting to other data structures would require only providing customized version of function get_value and get_subitems - having the same or different names according to your preferences.

查看更多
干净又极端
5楼-- · 2020-06-17 07:45

Edit: First version had a bug which is now corrected

I believe this should work, we're using the power of recursion!

def strip_leaves_from_tree(my_tree):
    result = list()
    row = dict()
    for key in my_tree:
        child = my_tree[key]
        if type(child) in (int, str,):
            row[key] = child
        elif isinstance(child, dict):
            result = strip_leaves_from_tree(child)
        elif isinstance(child, list):
            for element in child:
                result += strip_leaves_from_tree(element)
    if row: result = [row,]+result
    return result
查看更多
Bombasti
6楼-- · 2020-06-17 07:58

I verified it works. Please check it. Of course, it should be modified when you change the structure of dictionary-list.

def add(ret, val):
  if val is not None: ret.append(val)

def flatten(d, ret):
  for k,v in d.items():
    if isinstance(v, dict): add(ret,flatten(v, ret))
    elif isinstance(v, list):
        for i in v: add(ret, flatten(i, ret))
    elif k=='k':
        ret.append({'k':v,'stuff':d.get('stuff')})

ret = []
flatten(d, ret)
查看更多
女痞
7楼-- · 2020-06-17 07:59

This is not as generic but it does the job:

def parse_tree(d, keys):
   result = [{key: d[key] for key in keys}]
   l = d.get('l', None)
   if l is not None:
       entries = l.get('m', [])
       for entry in entries:
           result.extend(parse_tree(entry))
   return result


>>> parse_tree(d, ['k', 'stuff'])
[{'k': 1, 'stuff': 's1'},
 {'k': 2, 'stuff': 's2'},
 {'k': 3, 'stuff': 's3'},
 {'k': 4, 'stuff': 's4'},
 {'k': 5, 'stuff': 's5'},
 {'k': 6, 'stuff': 's6'}]
查看更多
登录 后发表回答