Dictionaries of dictionaries merge

2018-12-31 16:07发布

I need to merge multiple dictionaries, here's what I have for instance:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}

With A B C and D being leaves of the tree, like {"info1":"value", "info2":"value2"}

There is an unknown level(depth) of dictionaries, it could be {2:{"c":{"z":{"y":{C}}}}}

In my case it represents a directory/files structure with nodes being docs and leaves being files.

I want to merge them to obtain:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

I'm not sure how I could do that easily with Python.

21条回答
深知你不懂我心
2楼-- · 2018-12-31 16:24

Dictionaries of dictionaries merge

As this is the canonical question (in spite of certain non-generalities) I'm providing the canonical Pythonic approach to solving this issue.

Simplest Case: "leaves are nested dicts that end in empty dicts":

d1 = {'a': {1: {'foo': {}}, 2: {}}}
d2 = {'a': {1: {}, 2: {'bar': {}}}}
d3 = {'b': {3: {'baz': {}}}}
d4 = {'a': {1: {'quux': {}}}}

This is the simplest case for recursion, and I would recommend two naive approaches:

def rec_merge1(d1, d2):
    '''return new merged dict of dicts'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge1(v, d2[k])
    d3 = d1.copy()
    d3.update(d2)
    return d3

def rec_merge2(d1, d2):
    '''update first dict with second recursively'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge2(v, d2[k])
    d1.update(d2)
    return d1

I believe I would prefer the second to the first, but keep in mind that the original state of the first would have to be rebuilt from its origin. Here's the usage:

>>> from functools import reduce # only required for Python 3.
>>> reduce(rec_merge1, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
>>> reduce(rec_merge2, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

Complex Case: "leaves are of any other type:"

So if they end in dicts, it's a simple case of merging the end empty dicts. If not, it's not so trivial. If strings, how do you merge them? Sets can be updated similarly, so we could give that treatment, but we lose the order in which they were merged. So does order matter?

So in lieu of more information, the simplest approach will be to give them the standard update treatment if both values are not dicts: i.e. the second dict's value will overwrite the first, even if the second dict's value is None and the first's value is a dict with a lot of info.

d1 = {'a': {1: 'foo', 2: None}}
d2 = {'a': {1: None, 2: 'bar'}}
d3 = {'b': {3: 'baz'}}
d4 = {'a': {1: 'quux'}}

from collections import MutableMapping

def rec_merge(d1, d2):
    '''
    Update two dicts of dicts recursively, 
    if either mapping has leaves that are non-dicts, 
    the second's leaf overwrites the first's.
    '''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            # this next check is the only difference!
            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                d2[k] = rec_merge(v, d2[k])
            # we could further check types and merge as appropriate here.
    d3 = d1.copy()
    d3.update(d2)
    return d3

And now

from functools import reduce
reduce(rec_merge, (d1, d2, d3, d4))

returns

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

Application to the original question:

I've had to remove the curly braces around the letters and put them in single quotes for this to be legit Python (else they would be set literals in Python 2.7+) as well as append a missing brace:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

and rec_merge(dict1, dict2) now returns:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Which matches the desired outcome of the original question (after changing, e.g. the {A} to 'A'.)

查看更多
大哥的爱人
3楼-- · 2018-12-31 16:27

Based on answers from @andrew cooke. It takes care of nested lists in a better way.

def deep_merge_lists(original, incoming):
    """
    Deep merge two lists. Modifies original.
    Reursively call deep merge on each correlated element of list. 
    If item type in both elements are
     a. dict: call deep_merge_dicts on both values.
     b. list: Calls deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    If length of incoming list is more that of original then extra values are appended.
    """
    common_length = min(len(original), len(incoming))
    for idx in range(common_length):
        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
            deep_merge_dicts(original[idx], incoming[idx])

        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
            deep_merge_lists(original[idx], incoming[idx])

        else:
            orginal[idx] = incoming[idx]

    for idx in range(common_length, len(incoming)):
        original.append(incoming[idx])


def deep_merge_dicts(original, incoming):
    """
    Deep merge two dictionaries. Modfies original.
    For key conflicts if both values are:
     a. dict: Recursivley call deep_merge_dicts on both values.
     b. list: Calls deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    """
    for key in incoming:
        if key in original:
            if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                deep_merge_dicts(original[key], incoming[key])

            elif isinstance(original[key], list) and isinstance(incoming[key], list):
                deep_merge_lists(original[key], incoming[key])

            else:
                original[key] = incoming[key]
        else:
            original[key] = incoming[key]
查看更多
伤终究还是伤i
4楼-- · 2018-12-31 16:28

Overview

The following approach subdivides the problem of a deep merge of dicts into:

  1. A parameterized shallow merge function merge(f)(a,b) that uses a function f to merge two dicts a and b

  2. A recursive merger function f to be used together with merge


Implementation

A function for merging two (non nested) dicts can be written in a lot of ways. I personally like

def merge(f):
    def merge(a,b): 
        keys = a.keys() | b.keys()
        return {key:f(*[a.get(key), b.get(key)]) for key in keys}
    return merge

A nice way of defining an appropriate recurrsive merger function f is using multipledispatch which allows to define functions that evaluate along different paths depending on the type of their arguments.

from multipledispatch import dispatch

#for anything that is not a dict return
@dispatch(object, object)
def f(a, b):
    return b if b is not None else a

#for dicts recurse 
@dispatch(dict, dict)
def f(a,b):
    return merge(f)(a,b)

Example

To merge two nested dicts simply use merge(f) e.g.:

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
merge(f)(dict1, dict2)
#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}} 

Notes:

The advantages of this approach are:

  • The function is build from smaller functions that each do a single thing which makes the code simpler to reason about and test

  • The behaviour is not hard-coded but can be changed and extended as needed which improves code reuse (see example below).


Customization

Some answers also considered dicts that contain lists e.g. of other (potentially nested) dicts. In this case one might want map over the lists and merge them based on position. This can be done by adding another definition to the merger function f:

import itertools
@dispatch(list, list)
def f(a,b):
    return [merge(f)(*arg) for arg in itertools.zip_longest(a,b,fillvalue={})]
查看更多
春风洒进眼中
5楼-- · 2018-12-31 16:28

I have another slightly different solution here:

def deepMerge(d1, d2, inconflict = lambda v1,v2 : v2) :
''' merge d2 into d1. using inconflict function to resolve the leaf conflicts '''
    for k in d2:
        if k in d1 : 
            if isinstance(d1[k], dict) and isinstance(d2[k], dict) :
                deepMerge(d1[k], d2[k], inconflict)
            elif d1[k] != d2[k] :
                d1[k] = inconflict(d1[k], d2[k])
        else :
            d1[k] = d2[k]
    return d1

By default it resolves conflicts in favor of values from the second dict, but you can easily override this, with some witchery you may be able to even throw exceptions out of it. :).

查看更多
冷夜・残月
6楼-- · 2018-12-31 16:30
class Utils(object):

    """

    >>> a = { 'first' : { 'all_rows' : { 'pass' : 'dog', 'number' : '1' } } }
    >>> b = { 'first' : { 'all_rows' : { 'fail' : 'cat', 'number' : '5' } } }
    >>> Utils.merge_dict(b, a) == { 'first' : { 'all_rows' : { 'pass' : 'dog', 'fail' : 'cat', 'number' : '5' } } }
    True

    >>> main = {'a': {'b': {'test': 'bug'}, 'c': 'C'}}
    >>> suply = {'a': {'b': 2, 'd': 'D', 'c': {'test': 'bug2'}}}
    >>> Utils.merge_dict(main, suply) == {'a': {'b': {'test': 'bug'}, 'c': 'C', 'd': 'D'}}
    True

    """

    @staticmethod
    def merge_dict(main, suply):
        """
        获取融合的字典,以main为主,suply补充,冲突时以main为准
        :return:
        """
        for key, value in suply.items():
            if key in main:
                if isinstance(main[key], dict):
                    if isinstance(value, dict):
                        Utils.merge_dict(main[key], value)
                    else:
                        pass
                else:
                    pass
            else:
                main[key] = value
        return main

if __name__ == '__main__':
    import doctest
    doctest.testmod()
查看更多
只靠听说
7楼-- · 2018-12-31 16:32

I've been testing your solutions and decided to use this one in my project:

def mergedicts(dict1, dict2, conflict, no_conflict):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            yield (k, conflict(dict1[k], dict2[k]))
        elif k in dict1:
            yield (k, no_conflict(dict1[k]))
        else:
            yield (k, no_conflict(dict2[k]))

dict1 = {1:{"a":"A"}, 2:{"b":"B"}}
dict2 = {2:{"c":"C"}, 3:{"d":"D"}}

#this helper function allows for recursion and the use of reduce
def f2(x, y):
    return dict(mergedicts(x, y, f2, lambda x: x))

print dict(mergedicts(dict1, dict2, f2, lambda x: x))
print dict(reduce(f2, [dict1, dict2]))

Passing functions as parameteres is key to extend jterrace solution to behave as all the other recursive solutions.

查看更多
登录 后发表回答