I am pulling rows from a MySQL database as dictionaries (using SSDictCursor) and doing some processing, using the following approach:
from collections import namedtuple
class Foo(namedtuple('Foo', ['id', 'name', 'age'])):
__slots__ = ()
def __init__(self, *args):
super(Foo, self).__init__(self, *args)
# ...some class methods below here
class Bar(namedtuple('Bar', ['id', 'address', 'city', 'state']):
__slots__ = ()
def __init__(self, *args):
super(Bar, self).__init__(self, *args)
# some class methods here...
# more classes for distinct processing tasks...
To use namedtuple
, I have to know exactly the fields I want beforehand, which is fine. However, I would like to allow the user to feed a simple SELECT *
statement into my program, which will then iterate through the rows of the result set, performing multiple tasks using these different classes. In order to make this work, my classes have to somehow examine the N fields coming in from the cursor and take only the particular subset M < N corresponding to the names expected by the namedtuple
definition.
My first thought was to try writing a single decorator that I could apply to each of my classes, which would examine the class to see what fields it was expecting, and pass only the appropriate arguments to the new object. But I've just started reading about decorators in the past few days, and I'm not that confident yet with them.
So my question is in two parts:
- Is this possible to do with a single decorator, that will figure out which fields are needed by the specific class being decorated?
- Is there an alternative with the same functionality that will be easier to use, modify and understand?
I have too many potential permutations of tables and fields, with millions of rows in each result set, to just write one all-purpose namedtuple
subclass to deal with each different task. Query time and available memory have proven to be limiting factors.
If needed:
>>> sys.version
'2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)]'
First, you have to override __new__
in order to customize namedtuple
creation, because a namedtuple
's __new__
method checks its arguments before you even get to __init__
.
Second, if your goal is to accept and filter keyword arguments, you need to take **kwargs
and filter and pass that through, not just *args
.
So, putting it together:
class Foo(namedtuple('Foo', ['id', 'name', 'age'])):
__slots__ = ()
def __new__(cls, *args, **kwargs):
kwargs = {k: v for k, v in kwargs.items() if k in cls._fields}
return super(Foo, cls).__new__(cls, *args, **kwargs)
You could replace that dict comprehension with itemgetter
, but every time I use itemgetter with multiple keys, nobody understands what it means, so I've reluctantly stopped using it.
You can also override __init__
if you have a reason to do so, because it will be called as soon as __new__
returns a Foo
instance.
But you don't need to just for this, because the namedtuple's __init__
doesn't take any arguments or do anything; the values have already been set in __new__
(just as with tuple
, and other immutable types). It looks like with CPython 2.7, you actually can super(Foo, self).__init__(*args, **kwargs)
and it'll just be ignored, but with PyPy 1.9 and CPython 3.3, you get a TypeError. At any rate, there's no reason to pass them, and nothing saying it should work, so don't do it even in CPython 2.7.
Note that you __init__
will get the unfiltered kwargs
. If you want to change that, you could mutate kwargs
in-place in __new__
, instead of making a new dictionary. But I believe that still isn't guaranteed to do anything; it just makes it implementation-defined whether you get the filtered args or unfiltered, instead of guaranteeing the unfiltered.
So, can you wrap this up? Sure!
def LenientNamedTuple(name, fields):
class Wrapper(namedtuple(name, fields)):
__slots__ = ()
def __new__(cls, *args, **kwargs):
args = args[:len(fields)]
kwargs = {k: v for k, v in kwargs.items() if k in fields}
return super(Wrapper, cls).__new__(cls, *args, **kwargs)
return Wrapper
Note that this has the advantage of not having to use the quasi-private/semi-documented _fields
class attribute, because we already have fields
as a parameter.
Also, while we're at it, I added a line to toss away any excess positional arguments, as suggested in a comment.
Now you just use it as you'd use namedtuple
, and it automatically ignores any excess arguments:
class Foo(LenientNamedTuple('Foo', ['id', 'name', 'age'])):
pass
print(Foo(id=1, name=2, age=3, spam=4))
print(Foo(1, 2, 3, 4, 5))
print(Foo(1, age=3, name=2, eggs=4))
I've uploaded a test, replacing the dict comprehension with dict()
on a genexpr for 2.6 compatibility (2.6 is the earliest version with namedtuple
), but without the args truncating. It works with positional, keyword, and mixed args, including out-of-order keywords, in CPython 2.6.7, 2.7.2, 2.7.5, 3.2.3, 3.3.0, and 3.3.1, PyPy 1.9.0 and 2.0b1, and Jython 2.7b.
A namedtuple
type has an attribute _fields
which is a tuple of the names of the fields in the object. You could use this to dig out the required fields from the database record.