可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Long story short
PEP-557 introduced data classes into Python standard library, that basically can fill the same role as collections.namedtuple
and typing.NamedTuple
. And now I'm wondering how to separate the use cases in which namedtuple is still a better solution.
Data classes advantages over NamedTuple
Of course, all the credit goes to dataclass
if we need:
- mutable objects
- inheritance support
property
decorators, manageable attributes
- generated method definitions out of the box or customizable method definitions
Data classes advantages are briefly explained in the same PEP: Why not just use namedtuple.
Q: In which cases namedtuple is still a better choice?
But how about an opposite question for namedtuples: why not just use dataclass?
I guess probably namedtuple is better from the performance standpoint but found no confirmation on that yet.
Example
Let's consider the following situation:
We are going to store pages dimensions in a small container with statically defined fields, type hinting and named access. No further hashing, comparing and so on are needed.
NamedTuple approach:
from typing import NamedTuple
PageDimensions = NamedTuple("PageDimensions", [('width', int), ('height', int)])
DataClass approach:
from dataclasses import dataclass
@dataclass
class PageDimensions:
width: int
height: int
Which solution is preferable and why?
P.S. the question isn't a duplicate of that one in any way, because here I'm asking about the cases in which namedtuple is better, not about the difference (I've checked docs and sources before asking)
回答1:
It depends on your needs. Each of them has own benefits.
Here is a good explanation of Dataclasses on PyCon 2018 Raymond Hettinger - Dataclasses: The code generator to end all code generators
In Dataclass all implementation is written in Python, as in Namedtuple, all of these behaviors come for free because Namedtuple is inherited from tuple. And tuple structure is written in C, that's why stadard methods faster in Namedtuple (hash, comparing and etc).
But Dataclass is based on dict as Namedtuple based on tuple. According to this, you have advantages and disadvantages of using these structures. For example, space usage is smaller in NamedTuple, but time access is faster in Dataclass.
Please, see my experiment:
In [33]: a = PageDimensionsDC(width=10, height=10)
In [34]: sys.getsizeof(a) + sys.getsizeof(vars(a))
Out[34]: 168
In [35]: %timeit a.width
43.2 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [36]: a = PageDimensionsNT(width=10, height=10)
In [37]: sys.getsizeof(a)
Out[37]: 64
In [38]: %timeit a.width
63.6 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
But with increasing the number of attributes of NamedTuple access time remains the same small, because for each attribute it creates a property with the name of the attribute. For example, for our case the part of the namespace of the new class will look like:
from operator import itemgetter
class_namespace = {
...
'width': property(itemgetter(0, doc="Alias for field number 0")),
'height': property(itemgetter(0, doc="Alias for field number 1"))**
}
In which cases namedtuple is still a better choice?
When your data structure needs to/can be immutable, hashable, iterable, unpackable, comparable then you can use NamedTuple. If you need something more complicated, for example, a possibility of inheritance for your data structure then use Dataclass.
回答2:
In programming in general, anything that CAN be immutable SHOULD be immutable. We gain two things:
- Easier to read the program- we don't need to worry about values changing, once it's instantiated, it'll never change (namedtuple)
- Less chance for weird bugs
That's why, if the data is immutable, you should use a named tuple instead of a dataclass
I wrote it in the comment, but I'll mention it here:
You're definitely right that there is an overlap, especially with frozen=True
in dataclasses- but there are still features such as unpacking belonging to namedtuples, and it always being immutable- I doubt they'll remove namedtuples as such
回答3:
I had this same question, so ran a few tests and documented them here:
https://shayallenhill.com/python-struct-options/
The gist is that namedtuple is better for unpacking, exploding, and size. Dataclass is faster and more flexible.
The differences aren't tremendous, and I wouldn't refactor stable code to move from one to another.
回答4:
It is a little tricky to overload magic methods in NamedTuple
, which stems from limitations via the underlying metaclass. In such cases, a dataclass can work.
Here we overload the equality method to handle float comparisons.
Given
import math
import typing as typ
import dataclasses as dc
import statistics as stats
data = [11, 9, 7, 25, 38, 9]
Code
class StatsResultsNT(typ.NamedTuple):
mean: float
mode: float
std: float
@dc.dataclass
class StatsResultsDC:
mean: float
mode: float
std: float
# Overloads
def __iter__(self):
yield from dc.astuple(self)
def __eq__(self, other):
if isinstance(other, self.__class__):
return all([math.isclose(a, b, rel_tol=1e-3) for a, b, in zip(self, other)])
return NotImplemented
Demo
a = StatsResultsNT(stats.mean(data), stats.mode(data), stats.stdev(data))
b = StatsResultsNT(16.5, 9, 12.3895)
assert a == b
# AssertionError:
c = StatsResultsDC(stats.mean(data), stats.mode(data), stats.stdev(data))
d = StatsResultsDC(16.5, 9, 12.3895)
assert c == d
Dataclasses don't unpack like NamedTuple
, so implement __iter__
. You may also wish to turn on the last two parameters in @dc.dataclass()
to emulate a tuple.