How to implement autovivification for nested dicti

2020-07-10 07:12发布

TL;DR
How can I get superkeys to be autovivified in a Python dict when assigning values to subkeys, without also getting them autovivified when checking for subkeys?

Background: Normally in Python, setting values in a nested dictionary requires manually ensuring that higher-level keys exist before assigning to their sub-keys. That is,

my_dict[1][2] = 3

will not reliably work as intended without first doing something like

if 1 not in my_dict:
    my_dict[1] = {}

Now, it is possible to set up a kind of autovivification by making my_dict an instance of a class that overrides __missing__, as shown e.g. in https://stackoverflow.com/a/19829714/6670909.

Question: However, that solution silently autovivifies higher-level keys if you check for the existence of a sub-key in such a nested dict. That leads to the following unfortunateness:

>>> vd = Vividict()
>>> 1 in vd
False
>>> 2 in vd[1]
False
>>> 1 in vd
True

How can I avoid that misleading result? In Perl, by the way, I can get the desired behavior by doing

no autovivification qw/exists/;

And basically I'd like to replicate that behavior in Python if possible.

2条回答
贪生不怕死
2楼-- · 2020-07-10 07:50

The proper answer is: don't do this in Python, since explicit is better than implicit.

But if you really want autovivification that does not keep empty sub-dictionaries, one can emulate the behavior in Python.

try:
    from collections import MutableMapping
except:
    from collections.abc import MutableMapping


class AutoDict(MutableMapping, object):
    def __init__(self, *args, **kwargs):
        super(AutoDict, self).__init__()
        self.data = dict(*args, **kwargs)

    def __getitem__(self, key):
        if key in self.data:
            return self.data.__getitem__(key)
        else:
            return ChildAutoDict(parent=self, parent_key=key)

    def __setitem__(self, key, value):
        return self.data.__setitem__(key, value)

    def __delitem__(self, key):
        return self.data.__delitem__(key)

    def __iter__(self):
        return self.data.__iter__()

    def __len__(self):
        return self.data.__len__()

    def keys(self):
        return self.data.keys()

    def __contains__(self, key):
       return data.__contains__(key)

    def __str__(self):
        return str(self.data)

    def __unicode__(self):
        return unicode(self.data)

    def __repr__(self):
        return repr(self.data)

class ChildAutoDict(AutoDict):
    def __init__(self, parent, parent_key):
        super(ChildAutoDict, self).__init__()
        self.parent = parent
        self.parent_key = parent_key

    def __setitem__(self, key, value):
        if self.parent is not None and not self.parent_key in self.parent:
            # if parent got a new key in the meantime,
            # don't add ourselves
            self.parent.data[self.parent_key] = self
        else:
           self.parent = None
        return self.data.__setitem__(key, value)

    def __delitem__(self, key):
        ret = self.data.__delitem__(key)
        # only remove ourselves from the parent if we are 
        # still occupying our slot.
        if not self and self.parent and self is self.parent[parent_key]:
            self.parent.data.pop(self.parent_key)
        return ret

What you get back from the __getitem__() is essentially a dictionary facade that adds itself to the parent dictionary only if itself is not empty and removes itself once it becomes empty.

All of this --of course-- stops working once you assign a "normal" dictionary somewhere in the middle, i.e. d[2] = {}, d[2][3] = {} doesn't work any more and so on.

I have not really tested this thoroughly, so beware of more pitfalls.

d = AutoDict()

print(1 in d)
>>> False
print(d)
>>> {}

print(d[2][3])
>>> {}
print(d[2])
>>> {}
print(d)
>>> {}

d[2][3] = 1
print(d)
>>> {2: {3: 1}}

del d[2][3]
print(d)
>>> {}
查看更多
Emotional °昔
3楼-- · 2020-07-10 07:53

This is not an easy problem to solve, because in your example:

my_dict[1][2] = 3

my_dict[1] results in a __getitem__ call on the dictionary. There is no way at that point to know that an assignment is being made. Only the last [] in the sequence is a __setitem__ call, and it can't succeed unless mydict[1] exists, because otherwise, what object are you assigning into?

So don't use autovivication. You can use setdefault() instead, with a regular dict.

my_dict.setdefault(1, {})[2] = 3

Now that's not exactly pretty, especially when you are nesting more deeply, so you might write a helper method:

class MyDict(dict):
    def nest(self, keys, value):
       for key in keys[:-1]:
          self = self.setdefault(key, {})
       self[keys[-1]] = value

 my_dict = MyDict()
 my_dict.nest((1, 2), 3)       # my_dict[1][2] = 3

But even better is to wrap this into a new __setitem__ that takes all the indexes at once, instead of requiring the intermediate __getitem__ calls that induce the autovivication. This way, we know from the beginning that we're doing an assignment and can proceed without relying on autovivication.

class MyDict(dict):
    def __setitem__(self, keys, value):
       if not isinstance(keys, tuple):
           return dict.__setitem__(self, keys, value)
       for key in keys[:-1]:
          self = self.setdefault(key, {})
       dict.__setitem__(self, keys[-1], value)

my_dict = MyDict()
my_dict[1, 2] = 3

For consistency, you could also provide __getitem__ that accepts keys in a tuple as follows:

def __getitem__(self, keys):
   if not isinstance(keys, tuple):
       return dict.__getitem__(self, keys)
   for key in keys:
       self = dict.__getitem__(self, key)
   return self

The only downside I can think of is that we can't use tuples as dictionary keys as easily: we have to write that as, e.g. my_dict[(1, 2),].

查看更多
登录 后发表回答