Dictionaries of dictionaries merge

2018-12-31 16:07发布

I need to merge multiple dictionaries, here's what I have for instance:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}

With A B C and D being leaves of the tree, like {"info1":"value", "info2":"value2"}

There is an unknown level(depth) of dictionaries, it could be {2:{"c":{"z":{"y":{C}}}}}

In my case it represents a directory/files structure with nodes being docs and leaves being files.

I want to merge them to obtain:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

I'm not sure how I could do that easily with Python.

2楼-- · 2018-12-31 16:13

I had two dictionaries (a and b) which could each contain any number of nested dictionaries. I wanted to recursively merge them, with b taking precedence over a.

Considering the nested dictionaries as trees, what I wanted was:

  • To update a so that every path to every leaf in b would be represented in a
  • To overwrite subtrees of a if a leaf is found in the corresponding path in b
    • Maintain the invariant that all b leaf nodes remain leafs.

The existing answers were a little complicated for my taste and left some details on the shelf. I hacked together the following, which passes unit tests for my data set.

  def merge_map(a, b):
    if not isinstance(a, dict) or not isinstance(b, dict):
      return b

    for key in b.keys():
      a[key] = merge_map(a[key], b[key]) if key in a else b[key]
    return a

Example (formatted for clarity):

 a = {
    1 : {'a': 'red', 
         'b': {'blue': 'fish', 'yellow': 'bear' },
         'c': { 'orange': 'dog'},
    2 : {'d': 'green'},
    3: 'e'

  b = {
    1 : {'b': 'white'},
    2 : {'d': 'black'},
    3: 'e'

  >>> merge_map(a, b)
  {1: {'a': 'red', 
       'b': 'white',
       'c': {'orange': 'dog'},},
   2: {'d': 'black'},
   3: 'e'}

The paths in b that needed to be maintained were:

  • 1 -> 'b' -> 'white'
  • 2 -> 'd' -> 'black'
  • 3 -> 'e'.

a had the unique and non-conflicting paths of:

  • 1 -> 'a' -> 'red'
  • 1 -> 'c' -> 'orange' -> 'dog'

so they are still represented in the merged map.

3楼-- · 2018-12-31 16:14

One issue with this question is that the values of the dict can be arbitrarily complex pieces of data. Based upon these and other answers I came up with this code:

class YamlReaderError(Exception):

def data_merge(a, b):
    """merges b into a and return merged result

    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
    key = None
    # ## debug output
    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
            # border case for first run or if a is a primitive
            a = b
        elif isinstance(a, list):
            # lists can be only appended
            if isinstance(b, list):
                # merge lists
                # append to list
        elif isinstance(a, dict):
            # dicts must be merged
            if isinstance(b, dict):
                for key in b:
                    if key in a:
                        a[key] = data_merge(a[key], b[key])
                        a[key] = b[key]
                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
    except TypeError, e:
        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
    return a

My use case is merging YAML files where I only have to deal with a subset of possible data types. Hence I can ignore tuples and other objects. For me a sensible merge logic means

  • replace scalars
  • append lists
  • merge dicts by adding missing keys and updating existing keys

Everything else and the unforeseens results in an error.

4楼-- · 2018-12-31 16:15

Here's an easy way to do it using generators:

def mergedicts(dict1, dict2):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
                yield (k, dict(mergedicts(dict1[k], dict2[k])))
                # If one of the values is not a dict, you can't continue merging it.
                # Value from second dict overrides one in first and we move on.
                yield (k, dict2[k])
                # Alternatively, replace this with exception raiser to alert you of value conflicts
        elif k in dict1:
            yield (k, dict1[k])
            yield (k, dict2[k])

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}

print dict(mergedicts(dict1,dict2))

This prints:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
5楼-- · 2018-12-31 16:15

The code will depend on your rules for resolving merge conflicts, of course. Here's a version which can take an arbitrary number of arguments and merges them recursively to an arbitrary depth, without using any object mutation. It uses the following rules to resolve merge conflicts:

  • dictionaries take precedence over non-dict values ({"foo": {...}} takes precedence over {"foo": "bar"})
  • later arguments take precedence over earlier arguments (if you merge {"a": 1}, {"a", 2}, and {"a": 3} in order, the result will be {"a": 3})
    from collections import Mapping
except ImportError:
    Mapping = dict

def merge_dicts(*dicts):                                                            
    Return a new dictionary that is the result of merging the arguments together.   
    In case of conflicts, later arguments take precedence over earlier arguments.   
    updated = {}                                                                    
    # grab all keys                                                                 
    keys = set()                                                                    
    for d in dicts:                                                                 
        keys = keys.union(set(d))                                                   

    for key in keys:                                                                
        values = [d[key] for d in dicts if key in d]                                
        # which ones are mapping types? (aka dict)                                  
        maps = [value for value in values if isinstance(value, Mapping)]            
        if maps:                                                                    
            # if we have any mapping types, call recursively to merge them          
            updated[key] = merge_dicts(*maps)                                       
            # otherwise, just grab the last value we have, since later arguments    
            # take precedence over earlier arguments                                
            updated[key] = values[-1]                                               
    return updated  
6楼-- · 2018-12-31 16:19

This simple recursive procedure will merge one dictionary into another while overriding conflicting keys:

#!/usr/bin/env python2.7

def merge_dicts(dict1, dict2):
    """ Recursively merges dict2 into dict1 """
    if not isinstance(dict1, dict) or not isinstance(dict2, dict):
        return dict2
    for k in dict2:
        if k in dict1:
            dict1[k] = merge_dicts(dict1[k], dict2[k])
            dict1[k] = dict2[k]
    return dict1

print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))


{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}
7楼-- · 2018-12-31 16:20

There's a slight problem with andrew cookes answer: In some cases it modifies the second argument b when you modify the returned dict. Specifically it's because of this line:

if key in a:
    a[key] = b[key]

If b[key] is a dict, it will simply be assigned to a, meaning any subsequent modifications to that dict will affect both a and b.

merge(merge(a,b), c) # {'1': {'3': 'c', '2': 'b'}}
a # {'1': {'3': 'c', '2': 'b'}} (as expected)
b # {'1': {'3': 'c', '2': 'b'}} <----
c # {'1': {'3': 'c'}} (unmodified)

To fix this, the line would have to be substituted with this:

if isinstance(b[key], dict):
    a[key] = clone_dict(b[key])
    a[key] = b[key]

Where clone_dict is:

def clone_dict(obj):
    clone = {}
    for key, value in obj.iteritems():
        if isinstance(value, dict):
            clone[key] = clone_dict(value)
            clone[key] = value

Still. This obviously doesn't account for list, set and other stuff, but I hope it illustrates the pitfalls when trying to merge dicts.

And for completeness sake, here is my version, where you can pass it multiple dicts:

def merge_dicts(*args):
    def clone_dict(obj):
        clone = {}
        for key, value in obj.iteritems():
            if isinstance(value, dict):
                clone[key] = clone_dict(value)
                clone[key] = value

    def merge(a, b, path=[]):
        for key in b:
            if key in a:
                if isinstance(a[key], dict) and isinstance(b[key], dict):
                    merge(a[key], b[key], path + [str(key)])
                elif a[key] == b[key]:
                    raise Exception('Conflict at `{path}\''.format(path='.'.join(path + [str(key)])))
                if isinstance(b[key], dict):
                    a[key] = clone_dict(b[key])
                    a[key] = b[key]
        return a
    return reduce(merge, args, {})
登录 后发表回答