Validating detailed types in python dataclasses

2019-01-19 02:30发布

问题:

Python 3.7 is around the corner, and I wanted to test some of the fancy new dataclass+typing features. Getting hints to work right is easy enough, with both native types and those from the typing module:

>>> import dataclasses
>>> import typing as ty
>>> 
... @dataclasses.dataclass
... class Structure:
...     a_str: str
...     a_str_list: ty.List[str]
...
>>> my_struct = Structure(a_str='test', a_str_list=['t', 'e', 's', 't'])
>>> my_struct.a_str_list[0].  # IDE suggests all the string methods :)

But one other thing that I wanted to try was forcing the type hints as conditions during runtime, i.e. it should not be possible for a dataclass with incorrect types to exist. It can be implemented nicely with __post_init__:

>>> @dataclasses.dataclass
... class Structure:
...     a_str: str
...     a_str_list: ty.List[str]
...     
...     def validate(self):
...         ret = True
...         for field_name, field_def in self.__dataclass_fields__.items():
...             actual_type = type(getattr(self, field_name))
...             if actual_type != field_def.type:
...                 print(f"\t{field_name}: '{actual_type}' instead of '{field_def.type}'")
...                 ret = False
...         return ret
...     
...     def __post_init__(self):
...         if not self.validate():
...             raise ValueError('Wrong types')

This kind of validate function works for native types and custom classes, but not those specified by the typing module:

>>> my_struct = Structure(a_str='test', a_str_list=['t', 'e', 's', 't'])
Traceback (most recent call last):
  a_str_list: '<class 'list'>' instead of 'typing.List[str]'
  ValueError: Wrong types

Is there a better approach to validate an untyped list with a typing-typed one? Preferably one that doesn't include checking the types of all elements in any list, dict, tuple, or set that is a dataclass' attribute.

回答1:

Instead of checking for type equality, you should use isinstance. But you cannot use a parametrized generic type (typing.List[int]) to do so, you must use the "generic" version (typing.List). So you will be able to check for the container type but not the contained types. Parametrized generic types define an __origin__ attribute that you can use for that.

Contrary to Python 3.6, in Python 3.7 most type hints have a useful __origin__ attribute. Compare:

# Python 3.6
>>> import typing
>>> typing.List.__origin__
>>> typing.List[int].__origin__
typing.List

and

# Python 3.7
>>> import typing
>>> typing.List.__origin__
<class 'list'>
>>> typing.List[int].__origin__
<class 'list'>

Notable exceptions being typing.Any, typing.Union and typing.ClassVar… Well, anything that is a typing._SpecialForm does not define __origin__. Fortunately:

>>> isinstance(typing.Union, typing._SpecialForm)
True
>>> isinstance(typing.Union[int, str], typing._SpecialForm)
False
>>> typing.Union[int, str].__origin__
typing.Union

But parametrized types define an __args__ attribute that store their parameters as a tuple:

>>> typing.Union[int, str].__args__
(<class 'int'>, <class 'str'>)

So we can improve type checking a bit:

for field_name, field_def in self.__dataclass_fields__.items():
    if isinstance(field_def.type, typing._SpecialForm):
        # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)
        continue
    try:
        actual_type = field_def.type.__origin__
    except AttributeError:
        actual_type = field_def.type
    if isinstance(actual_type, typing._SpecialForm):
        # case of typing.Union[…] or typing.ClassVar[…]
        actual_type = field_def.type.__args__

    actual_value = getattr(self, field_name)
    if not isinstance(actual_value, actual_type):
        print(f"\t{field_name}: '{type(actual_value)}' instead of '{field_def.type}'")
        ret = False

This is not perfect as it won't account for typing.ClassVar[typing.Union[int, str]] or typing.Optional[typing.List[int]] for instance, but it should get things started.


Next is the way to apply this check.

Instead of using __post_init__, I would go the decorator route: this could be used on anything with type hints, not only dataclasses:

import inspect
import typing
from contextlib import suppress
from functools import wraps


def enforce_types(callable):
    spec = inspect.getfullargspec(callable)

    def check_types(*args, **kwargs):
        parameters = dict(zip(spec.args, args))
        parameters.update(kwargs)
        for name, value in parameters.items():
            with suppress(KeyError):  # Assume un-annotated parameters can be any type
                type_hint = spec.annotations[name]
                if isinstance(type_hint, typing._SpecialForm):
                    # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)
                    continue
                try:
                    actual_type = type_hint.__origin__
                except AttributeError:
                    actual_type = type_hint
                if isinstance(actual_type, typing._SpecialForm):
                    # case of typing.Union[…] or typing.ClassVar[…]
                    actual_type = type_hint.__args__

                if not isinstance(value, actual_type):
                    raise TypeError('Unexpected type for \'{}\' (expected {} but found {})'.format(name, type_hint, type(value)))

    def decorate(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            check_types(*args, **kwargs)
            return func(*args, **kwargs)
        return wrapper

    if inspect.isclass(callable):
        callable.__init__ = decorate(callable.__init__)
        return callable

    return decorate(callable)

Usage being:

@enforce_types
@dataclasses.dataclass
class Point:
    x: float
    y: float

@enforce_types
def foo(bar: typing.Union[int, str]):
    pass

Appart from validating some type hints as suggested in the previous section, this approach still have some drawbacks:

  • type hints using strings (class Foo: def __init__(self: 'Foo'): pass) are not taken into account by inspect.getfullargspec: you may want to use typing.get_type_hints and inspect.signature instead;
  • a default value which is not the appropriate type is not validated:

    @enforce_type
    def foo(bar: int = None):
        pass
    
    foo()
    

    does not raise any TypeError. You may want to use inspect.Signature.bind in conjuction with inspect.BoundArguments.apply_defaults if you want to account for that (and thus forcing you to define def foo(bar: typing.Optional[int] = None));

  • variable number of arguments can't be validated as you would have to define something like def foo(*args: typing.Sequence, **kwargs: typing.Mapping) and, as said at the beginning, we can only validate containers and not contained objects.

Thanks to @Aran-Fey that helped me improve this answer.