Many Python builtin "functions" are actually classes, although they also have a straightforward function implementation. Even very simple ones, such as itertools.repeat
. What is the motivation for this? It seems like over-engineering to me.
Edit: I am not asking about the purpose of itertools.repeat
or any other particular function. It was just an example of a very simple function with a very simple possible impementation:
def repeat(x):
while True: yield x
But itertools.repeat
is not actually a function, it's implemented as a class. My question is: Why? It seems like unnecessary overhead.
Also I understand that classes are callable functions and how you can emulate a function-like behavior using a class. But I don't understand why it's so widely used through the standard library.
Implementing as a class for itertools
has some advantages that generator functions don't have. For example:
- CPython implements these built-ins at the C layer, and at the C layer, a generator "function" is best implemented as a class implementing
__next__
that preserves state as instance attributes; yield
based generators are a Python layer nicety, and really, they're just an instance of the generator
class (so they're actually still class instances, like everything else in Python)
- Generators aren't pickleable or copyable, and don't have "story" for making them support either behavior (the internal state is too complex and opaque to generalize it); a class can define
__reduce__
/__copy__
/__deepcopy__
(and if it's a Python level class, it probably doesn't even need to do that; it will work automatically) and make the instances pickleable/copyable (so if you have already generated 5 elements from a range
iterator, you can copy or pickle/unpickle it, and get an iterator the same distance along in iteration)
For non-generator tools, the reasons are usually similar. Classes can be given state and customized behaviors that a function can't. They can be inherited from (if that's desired, but C layer classes can prohibit subclassing if they're "logically" functions).
It's also useful for dynamic instance creation; if you have an instance of an unknown class but a known prototype (say, the sequence constructors that take an iterable, or chain
or whatever), and you want to convert some other type to that class, you can do type(unknown)(constructorarg)
; if it's a generator, type(unknown)
is useless, you can't use it to make more of itself because you can't introspect to figure out where it came from (not in reasonable ways).
And beyond that, even if you never use the features for programming logic, what would you rather see in the interactive interpreter or doing print debugging of type(myiter)
, <class 'generator'>
that gives no hints as to origin, or <class 'itertools.repeat'>
that tells you exactly what you have and where it came from?
Both functions and classes are callables, so they can be used interchangeably in higher-order functions, for example.
$ python2
...
>>> map(dict, [["ab"], ["cd"], ["ef"]])
[{'a': 'b'}, {'c': 'd'}, {'e': 'f'}]
>>> map(lambda x: dict(x), [["ab"], ["cd"], ["ef"]])
[{'a': 'b'}, {'c': 'd'}, {'e': 'f'}]
That said, classes can also define methods that you can later call on the returned objects. For instance, the dict
class defines the .get()
method for dictionaries, etc.
In the case of itertools.repeat
(and most iterators), using a proper class implementing the iterator
protocol has a few advantages from the implementation / maintenance POV - like you can have better control of the iteration, you can specialize the class etc. I also suspect that there are some optimisations that can be done at C-level for proper iterators that don't apply to generators.
Also remember that classes and functions are objects too - the def
statement is mostly syntactic sugar for creating a function
instance and populating it with compiled code, local namespace, cells, closures and whatnots (a somehow involved task FWIW, I did once just for out of curiousity and it was a major PITA), and the class
statement is also syntactic sugar for creating a new type
instance (doing it manually happens to be really trivial actually). From this POV, yield
is a similar syntactic sugar that turns your function into a factory returning instances of the generic generator
builtin type - IOW it makes your function act like a class, without the hassle of writing a full-blown class but also without the fine control and possible optimisations you can get by writing a full-blown class.
On a more general leval, sometimes writing your "function" as a custom callable type instead offers similar gains - fine control, possible optimisations, and well sometimes just better readability (think of two-steps decorators, custom descriptors etc).
Finally wrt/ builtin types (int
, str
etc) IIRC (please someone correct me if i'm wrong) they originally were functions acting as factory functions (before the new-style classes revolution when builtin types and user-defined types were different kind of objects). It of course makes sense to have them as plain classes now, but they had to keep the all_lower naming scheme for compatibility.