Why are many Python built-in/standard library func

2020-07-10 10:48发布

问题:

Many Python builtin "functions" are actually classes, although they also have a straightforward function implementation. Even very simple ones, such as itertools.repeat. What is the motivation for this? It seems like over-engineering to me.

Edit: I am not asking about the purpose of itertools.repeat or any other particular function. It was just an example of a very simple function with a very simple possible impementation:

def repeat(x):
    while True: yield x

But itertools.repeat is not actually a function, it's implemented as a class. My question is: Why? It seems like unnecessary overhead.

Also I understand that classes are callable functions and how you can emulate a function-like behavior using a class. But I don't understand why it's so widely used through the standard library.

回答1:

Implementing as a class for itertools has some advantages that generator functions don't have. For example:

  1. CPython implements these built-ins at the C layer, and at the C layer, a generator "function" is best implemented as a class implementing __next__ that preserves state as instance attributes; yield based generators are a Python layer nicety, and really, they're just an instance of the generator class (so they're actually still class instances, like everything else in Python)
  2. Generators aren't pickleable or copyable, and don't have "story" for making them support either behavior (the internal state is too complex and opaque to generalize it); a class can define __reduce__/__copy__/__deepcopy__ (and if it's a Python level class, it probably doesn't even need to do that; it will work automatically) and make the instances pickleable/copyable (so if you have already generated 5 elements from a range iterator, you can copy or pickle/unpickle it, and get an iterator the same distance along in iteration)

For non-generator tools, the reasons are usually similar. Classes can be given state and customized behaviors that a function can't. They can be inherited from (if that's desired, but C layer classes can prohibit subclassing if they're "logically" functions).

It's also useful for dynamic instance creation; if you have an instance of an unknown class but a known prototype (say, the sequence constructors that take an iterable, or chain or whatever), and you want to convert some other type to that class, you can do type(unknown)(constructorarg); if it's a generator, type(unknown) is useless, you can't use it to make more of itself because you can't introspect to figure out where it came from (not in reasonable ways).

And beyond that, even if you never use the features for programming logic, what would you rather see in the interactive interpreter or doing print debugging of type(myiter), <class 'generator'> that gives no hints as to origin, or <class 'itertools.repeat'> that tells you exactly what you have and where it came from?



回答2:

Both functions and classes are callables, so they can be used interchangeably in higher-order functions, for example.

$ python2
... 
>>> map(dict, [["ab"], ["cd"], ["ef"]])
[{'a': 'b'}, {'c': 'd'}, {'e': 'f'}]
>>> map(lambda x: dict(x), [["ab"], ["cd"], ["ef"]])
[{'a': 'b'}, {'c': 'd'}, {'e': 'f'}]

That said, classes can also define methods that you can later call on the returned objects. For instance, the dict class defines the .get() method for dictionaries, etc.



回答3:

In the case of itertools.repeat (and most iterators), using a proper class implementing the iterator protocol has a few advantages from the implementation / maintenance POV - like you can have better control of the iteration, you can specialize the class etc. I also suspect that there are some optimisations that can be done at C-level for proper iterators that don't apply to generators.

Also remember that classes and functions are objects too - the def statement is mostly syntactic sugar for creating a function instance and populating it with compiled code, local namespace, cells, closures and whatnots (a somehow involved task FWIW, I did once just for out of curiousity and it was a major PITA), and the class statement is also syntactic sugar for creating a new type instance (doing it manually happens to be really trivial actually). From this POV, yield is a similar syntactic sugar that turns your function into a factory returning instances of the generic generator builtin type - IOW it makes your function act like a class, without the hassle of writing a full-blown class but also without the fine control and possible optimisations you can get by writing a full-blown class.

On a more general leval, sometimes writing your "function" as a custom callable type instead offers similar gains - fine control, possible optimisations, and well sometimes just better readability (think of two-steps decorators, custom descriptors etc).

Finally wrt/ builtin types (int, str etc) IIRC (please someone correct me if i'm wrong) they originally were functions acting as factory functions (before the new-style classes revolution when builtin types and user-defined types were different kind of objects). It of course makes sense to have them as plain classes now, but they had to keep the all_lower naming scheme for compatibility.