How can I make as "perfect" a subclass of dict as possible? The end goal is to have a simple dict in which the keys are lowercase.
It would seem that there should be some tiny set of primitives I can override to make this work, but according to all my research and attempts it seem like this isn't the case:
If I override
__getitem__
/__setitem__
, thenget
/set
don't work. How can I make them work? Surely I don't need to implement them individually?Am I preventing pickling from working, and do I need to implement
__setstate__
etc?Should I just use mutablemapping (it seems one shouldn't use
UserDict
orDictMixin
)? If so, how? The docs aren't exactly enlightening.
Here is my first go at it, get()
doesn't work and no doubt there are many other minor problems:
class arbitrary_dict(dict):
"""A dictionary that applies an arbitrary key-altering function
before accessing the keys."""
def __keytransform__(self, key):
return key
# Overridden methods. List from
# https://stackoverflow.com/questions/2390827/how-to-properly-subclass-dict
def __init__(self, *args, **kwargs):
self.update(*args, **kwargs)
# Note: I'm using dict directly, since super(dict, self) doesn't work.
# I'm not sure why, perhaps dict is not a new-style class.
def __getitem__(self, key):
return dict.__getitem__(self, self.__keytransform__(key))
def __setitem__(self, key, value):
return dict.__setitem__(self, self.__keytransform__(key), value)
def __delitem__(self, key):
return dict.__delitem__(self, self.__keytransform__(key))
def __contains__(self, key):
return dict.__contains__(self, self.__keytransform__(key))
class lcdict(arbitrary_dict):
def __keytransform__(self, key):
return str(key).lower()
You can write an object that behaves like a dict quite easily with ABCs (Abstract Base Classes) from the collections module. It even tells you if you missed a method, so below is the minimal version that shuts the ABC up.
You get a few free methods from the ABC:
I wouldn't subclass
dict
(or other builtins) directly. It often makes no sense, because what you actually want to do is implement the interface of a dict. And that is exactly what ABCs are for.After trying out both of the top two suggestions, I've settled on a shady-looking middle route for Python 2.7. Maybe 3 is saner, but for me:
which I really hate, but seems to fit my needs, which are:
**my_dict
dict
, this bypasses your code. try it out.isinstance(my_dict, dict)
dict
If you need to tell yourself apart from others, personally I use something like this (though I'd recommend better names):
As long as you only need to recognize yourself internally, this way it's harder to accidentally call
__am_i_me
due to python's name-munging (this is renamed to_MyDict__am_i_me
from anything calling outside this class). Slightly more private than_method
s, both in practice and culturally.So far I have no complaints, aside from the seriously-shady-looking
__class__
override. I'd be thrilled to hear of any problems that others encounter with this though, I don't fully understand the consequences. But so far I've had no problems whatsoever, and this allowed me to migrate a lot of middling-quality code in lots of locations without needing any changes.As evidence: https://repl.it/repls/TraumaticToughCockatoo
Basically: copy the current #2 option, add
print 'method_name'
lines to every method, and then try this and watch the output:You'll see similar behavior for other scenarios. Say your fake-
dict
is a wrapper around some other datatype, so there's no reasonable way to store the data in the backing-dict;**your_dict
will be empty, regardless of what every other method does.This works correctly for
MutableMapping
, but as soon as you inherit fromdict
it becomes uncontrollable.My requirements were a bit stricter:
My initial thought was to substitute our clunky Path class for a case insensitive unicode subclass - but:
some_dict[CIstr(path)]
is ugly)So I had finally to write down that case insensitive dict. Thanks to code by @AaronHall that was made 10 times easier.
Implicit vs explicit is still a problem, but once dust settles, renaming of attributes/variables to start with ci (and a big fat doc comment explaining that ci stands for case insensitive) I think is a perfect solution - as readers of the code must be fully aware that we are dealing with case insensitive underlying data structures. This will hopefully fix some hard to reproduce bugs, which I suspect boil down to case sensitivity.
Comments/corrections welcome :)
The accepted answer would be my first approach, but since it has some issues, and since no one has addressed the alternative, actually subclassing a
dict
, I'm going to do that here.What's wrong with the accepted answer?
This seems like a rather simple request to me:
The accepted answer doesn't actually subclass
dict
, and a test for this fails:Ideally, any type-checking code would be testing for the interface we expect, or an abstract base class, but if our data objects are being passed into functions that are testing for
dict
- and we can't "fix" those functions, this code will fail.Other quibbles one might make:
fromkeys
.The accepted answer also has a redundant
__dict__
- therefore taking up more space in memory:Actually subclassing
dict
We can reuse the dict methods through inheritance. All we need to do is create an interface layer that ensures keys are passed into the dict in lowercase form if they are strings.
Well, implementing them each individually is the downside to this approach and the upside to using
MutableMapping
(see the accepted answer), but it's really not that much more work.First, let's factor out the difference between Python 2 and 3, create a singleton (
_RaiseKeyError
) to make sure we know if we actually get an argument todict.pop
, and create a function to ensure our string keys are lowercase:Now we implement - I'm using
super
with the full arguments so that this code works for Python 2 and 3:We use an almost boiler-plate approach for any method or special method that references a key, but otherwise, by inheritance, we get methods:
len
,clear
,items
,keys
,popitem
, andvalues
for free. While this required some careful thought to get right, it is trivial to see that this works.(Note that
haskey
was deprecated in Python 2, removed in Python 3.)Here's some usage:
pickling
And the dict subclass pickles just fine:
__repr__
We defined
update
and__init__
, but you have a beautiful__repr__
by default:However, it's good to write a
__repr__
to improve the debugability of your code. The ideal test iseval(repr(obj)) == obj
. If it's easy to do for your code, I strongly recommend it:You see, it's exactly what we need to recreate an equivalent object - this is something that might show up in our logs or in backtraces:
Conclusion
Yeah, these are a few more lines of code, but they're intended to be comprehensive. My first inclination would be to use the accepted answer, and if there were issues with it, I'd then look at my answer - as it's a little more complicated, and there's no ABC to help me get my interface right.
Premature optimization is going for greater complexity in search of performance.
MutableMapping
is simpler - so it gets an immediate edge, all else being equal. Nevertheless, to lay out all the differences, let's compare and contrast.I should add that there was a push to put a similar dictionary into the
collections
module, but it was rejected. You should probably just do this instead:It should be far more easily debugable.
Compare and contrast
There are 6 interface functions implemented with the
MutableMapping
(which is missingfromkeys
) and 11 with thedict
subclass. I don't need to implement__iter__
or__len__
, but instead I have to implementget
,setdefault
,pop
,update
,copy
,__contains__
, andfromkeys
- but these are fairly trivial, since I can use inheritance for most of those implementations.The
MutableMapping
implements some things in Python thatdict
implements in C - so I would expect adict
subclass to be more performant in some cases.We get a free
__eq__
in both approaches - both of which assume equality only if another dict is all lowercase - but again, I think thedict
subclass will compare more quickly.Summary:
MutableMapping
is simpler with fewer opportunities for bugs, but slower, takes more memory (see redundant dict), and failsisinstance(x, dict)
dict
is faster, uses less memory, and passesisinstance(x, dict)
, but it has greater complexity to implement.Which is more perfect? That depends on your definition of perfect.
All you will have to do is
OR
A sample usage for my personal use
Note: tested only in python3