OrderedDict comprehensions

2019-01-18 01:12发布

问题:

Can I extend syntax in python for dict comprehensions for other dicts, like the OrderedDict in collections module or my own types which inherit from dict?

Just rebinding the dict name obviously doesn't work, the {key: value} comprehension syntax still gives you a plain old dict for comprehensions and literals.

>>> from collections import OrderedDict
>>> olddict, dict = dict, OrderedDict
>>> {i: i*i for i in range(3)}.__class__
<type 'dict'>

So, if it's possible how would I go about doing that? It's OK if it only works in CPython. For syntax I guess I would try it with a O{k: v} prefix like we have on the r'various' u'string' b'objects'.

note: Of course we can use a generator expression instead, but I'm more interested seeing how hackable python is in terms of the grammar.

回答1:

There is no direct way to change Python's syntax from within the language. A dictionary comprehension (or plain display) is always going to create a dict, and there's nothing you can do about that. If you're using CPython, it's using special bytecodes that generate a dict directly, which ultimately call the PyDict API functions and/or the same underlying functions used by that API. If you're using PyPy, those bytecodes are instead implemented on top of an RPython dict object which in turn is implemented on top of a compiled-and-optimized Python dict. And so on.

There is an indirect way to do it, but you're not going to like it. If you read the docs on the import system, you'll see that it's the importer that searches for cached compiled code or calls the compiler, and the compiler that calls the parser, and so on. In Python 3.3+, almost everything in this chain either is written in pure Python, or has an alternate pure Python implementation, meaning you can fork the code and do your own thing. Which includes parsing source with your own PyParsing code that builds ASTs, or compiling a dict comprehension AST node into your own custom bytecode instead of the default, or post-processing the bytecode, or…

In many cases, an import hook is sufficient; if not, you can always write a custom finder and loader.

If you're not already using Python 3.3 or later, I'd strongly suggest migrating before playing with this stuff. In older versions, it's harder, and less well documented, and you'll ultimately be putting in 10x the effort to learn something that will be obsolete whenever you do migrate.

Anyway, if this approach sounds interesting to you, you might want to take a look at MacroPy. You could borrow some code from it—and, maybe more importantly, learn how some of these features (that have no good examples in the docs) are used.

Or, if you're willing to settle for something less cool, you can just use MacroPy to build an "odict comprehension macro" and use that. (Note that MacroPy currently only works in Python 2.7, not 3.x.) You can't quite get o{…}, but you can get, say, od[{…}], which isn't too bad. Download od.py, realmain.py, and main.py, and run python main.py to see it working. The key is this code, which takes a DictionaryComp AST, converts it to an equivalent GeneratorExpr on key-value Tuples, and wraps it in a Call to collections.OrderedDict:

def od(tree, **kw):
    pair = ast.Tuple(elts=[tree.key, tree.value])
    gx = ast.GeneratorExp(elt=pair, generators=tree.generators)
    odict = ast.Attribute(value=ast.Name(id='collections'), 
                          attr='OrderedDict')
    call = ast.Call(func=odict, args=[gx], keywords=[])
    return call

A different alternative is, of course, to modify the Python interpreter.

I would suggest dropping the O{…} syntax idea for your first go, and just making normal dict comprehensions compile to odicts. The good news is, you don't really need to change the grammar (which is beyond hairy…), just any one of:

  • the bytecodes that dictcomps compile to,
  • the way the interpreter runs those bytecodes, or
  • the implementation of the PyDict type

The bad news, while all of those are a lot easier than changing the grammar, none of them can be done from an extension module. (Well, you can do the first one by doing basically the same thing you'd do from pure Python… and you can do any of them by hooking the .so/.dll/.dylib to patch in your own functions, but that's the exact same work as hacking on Python plus the extra work of hooking at runtime.)

If you want to hack on CPython source, the code you want is in Python/compile.c, Python/ceval.c, and Objects/dictobject.c, and the dev guide tells you how to find everything you need. But you might want to consider hacking on PyPy source instead, since it's mostly written in (a subset of) Python rather than C.


As a side note, your attempt wouldn't have worked even if everything were done at the Python language level. olddict, dict = dict, OrderedDict creates a binding named dict in your module's globals, which shadows the name in builtins, but doesn't replace it. You can replace things in builtins (well, Python doesn't guarantee this, but there are implementation/version-specific things-that-happen-to-work for every implementation/version I've tried…), but what you did isn't the way to do it.



回答2:

Sorry, not possible. Dict literals and dict comprehensions map to the built-in dict type, in a way that's hardcoded at the C level. That can't be overridden.

You can use this as an alternative, though:

OrderedDict((i, i * i) for i in range(3))

Addendum: as of Python 3.6, all Python dictionaries are ordered. As of 3.7, it's even part of the language spec. If you're using those versions of Python, no need for OrderedDict: the dict comprehension will Just Work (TM).



回答3:

Slightly modifying the response of @Max Noel, you can use list comprehension instead of a generator to create an OrderedDict in an ordered way (which of course is not possible using dict comprehension).

>>> OrderedDict([(i, i * i) for i in range(5)])
OrderedDict([(0, 0), 
             (1, 1), 
             (2, 4), 
             (3, 9), 
             (4, 16)])