Creating a namedtuple object using only a subset o

2019-02-20 05:58发布

问题:

I am pulling rows from a MySQL database as dictionaries (using SSDictCursor) and doing some processing, using the following approach:

from collections import namedtuple

class Foo(namedtuple('Foo', ['id', 'name', 'age'])):
    __slots__ = ()

    def __init__(self, *args):
        super(Foo, self).__init__(self, *args)

    # ...some class methods below here

class Bar(namedtuple('Bar', ['id', 'address', 'city', 'state']):
    __slots__ = ()

    def __init__(self, *args):
        super(Bar, self).__init__(self, *args)

    # some class methods here...

# more classes for distinct processing tasks...

To use namedtuple, I have to know exactly the fields I want beforehand, which is fine. However, I would like to allow the user to feed a simple SELECT * statement into my program, which will then iterate through the rows of the result set, performing multiple tasks using these different classes. In order to make this work, my classes have to somehow examine the N fields coming in from the cursor and take only the particular subset M < N corresponding to the names expected by the namedtuple definition.

My first thought was to try writing a single decorator that I could apply to each of my classes, which would examine the class to see what fields it was expecting, and pass only the appropriate arguments to the new object. But I've just started reading about decorators in the past few days, and I'm not that confident yet with them.

So my question is in two parts:

  1. Is this possible to do with a single decorator, that will figure out which fields are needed by the specific class being decorated?
  2. Is there an alternative with the same functionality that will be easier to use, modify and understand?

I have too many potential permutations of tables and fields, with millions of rows in each result set, to just write one all-purpose namedtuple subclass to deal with each different task. Query time and available memory have proven to be limiting factors.

If needed:

>>> sys.version
'2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)]'

回答1:

First, you have to override __new__ in order to customize namedtuple creation, because a namedtuple's __new__ method checks its arguments before you even get to __init__.

Second, if your goal is to accept and filter keyword arguments, you need to take **kwargs and filter and pass that through, not just *args.

So, putting it together:

class Foo(namedtuple('Foo', ['id', 'name', 'age'])):
    __slots__ = ()

    def __new__(cls, *args, **kwargs):
        kwargs = {k: v for k, v in kwargs.items() if k in cls._fields}
        return super(Foo, cls).__new__(cls, *args, **kwargs)

You could replace that dict comprehension with itemgetter, but every time I use itemgetter with multiple keys, nobody understands what it means, so I've reluctantly stopped using it.


You can also override __init__ if you have a reason to do so, because it will be called as soon as __new__ returns a Foo instance.

But you don't need to just for this, because the namedtuple's __init__ doesn't take any arguments or do anything; the values have already been set in __new__ (just as with tuple, and other immutable types). It looks like with CPython 2.7, you actually can super(Foo, self).__init__(*args, **kwargs) and it'll just be ignored, but with PyPy 1.9 and CPython 3.3, you get a TypeError. At any rate, there's no reason to pass them, and nothing saying it should work, so don't do it even in CPython 2.7.

Note that you __init__ will get the unfiltered kwargs. If you want to change that, you could mutate kwargs in-place in __new__, instead of making a new dictionary. But I believe that still isn't guaranteed to do anything; it just makes it implementation-defined whether you get the filtered args or unfiltered, instead of guaranteeing the unfiltered.


So, can you wrap this up? Sure!

def LenientNamedTuple(name, fields):
    class Wrapper(namedtuple(name, fields)):
        __slots__ = ()
        def __new__(cls, *args, **kwargs):
            args = args[:len(fields)]
            kwargs = {k: v for k, v in kwargs.items() if k in fields}
            return super(Wrapper, cls).__new__(cls, *args, **kwargs)
    return Wrapper

Note that this has the advantage of not having to use the quasi-private/semi-documented _fields class attribute, because we already have fields as a parameter.

Also, while we're at it, I added a line to toss away any excess positional arguments, as suggested in a comment.


Now you just use it as you'd use namedtuple, and it automatically ignores any excess arguments:

class Foo(LenientNamedTuple('Foo', ['id', 'name', 'age'])):
    pass

print(Foo(id=1, name=2, age=3, spam=4))

    print(Foo(1, 2, 3, 4, 5))     print(Foo(1, age=3, name=2, eggs=4))


I've uploaded a test, replacing the dict comprehension with dict() on a genexpr for 2.6 compatibility (2.6 is the earliest version with namedtuple), but without the args truncating. It works with positional, keyword, and mixed args, including out-of-order keywords, in CPython 2.6.7, 2.7.2, 2.7.5, 3.2.3, 3.3.0, and 3.3.1, PyPy 1.9.0 and 2.0b1, and Jython 2.7b.



回答2:

A namedtuple type has an attribute _fields which is a tuple of the names of the fields in the object. You could use this to dig out the required fields from the database record.