How to pythonically have partially-mutually exclus

2019-03-25 02:48发布

站内文章 / Python

22 0

神经病院院长

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

As a simple example, take a class Ellipse that can return its properties such as area A, circumference C, major/minor axis a/b, eccentricity eetc. In order to get that, one obviously has to provide precisely two of its parameters to obtain all the other ones, though as a special case providing only one parameter should assume a circle. Three or more parameters that are consistent should yield a warning but work, otherwise obviously raise an exception.

So some examples of valid Ellipses are:

Ellipse(a=5, b=2)
Ellipse(A=3)
Ellipse(a=3, e=.1)
Ellipse(a=3, b=3, A=9*math.pi)  # note the consistency

while invalid ones would be

Ellipse()
Ellipse(a=3, b=3, A=7)

The constructor would therefore either contain many =None arguments,

class Ellipse(object):
    def __init__(self, a=None, b=None, A=None, C=None, ...):

or, probably more sensible, a simple **kwargs, maybe adding the option to provide a,b as positional arguments,

class Ellipse(object):
    def __init__(self, a=None, b=None, **kwargs):
        kwargs.update({key: value
                       for key, value in (('a', a), ('b', b))
                       if value is not None})

So far, so good. But now comes the actual implementation, i.e. figuring out which parameters were provided and which were not and determine all the others depending on them, or check for consistency if required.

My first approach would be a simple yet tedious combination of many

if 'a' in kwargs:
    a = kwargs['a']
    if 'b' in kwargs:
        b = kwargs['b']
        A = kwargs['A'] = math.pi * a * b
        f = kwargs['f'] = math.sqrt(a**2 - b**2)
        ...
    elif 'f' in kwargs:
        f = kwargs['f']
        b = kwargs['b'] = math.sqrt(a**2 + f**2)
        A = kwargs['A'] = math.pi * a * b
        ...
    elif ...

and so on^*. But is there no better way? Or is this class design totally bollocks and I should create constructors such as Ellipse.create_from_a_b(a, b), despite that basically making the "provide three or more consistent parameters" option impossible?

Bonus question: Since the ellipse's circumference involves elliptic integrals (or elliptic functions if the circumference is provided and the other parameters are to be obtained) which are not exactly computationally trivial, should those calculations actually be in the constructor or rather be put into the @property Ellipse.C?

^* I guess at least one readability improvement would be always extracting a and b and calculating the rest from them but that means recalculating the values already provided, wasting both time and precision...

回答1:

My proposal is focused on data encapsulation and code readability.

a) Pick pair on unambigous measurements to represent ellipse internally

class Ellipse(object):
    def __init__(a, b):
        self.a = a
        self.b = b

b) Create family of properties to get desired metrics about ellipse

class Ellipse(object):
    @property
    def area(self):
        return math.pi * self._x * self._b

c) Create factory class / factory methods with unambigous names:

class Ellipse(object):
    @classmethod
    def fromAreaAndCircumference(cls, area, circumference):
        # convert area and circumference to common format
        return cls(a, b)

Sample usage:

ellipse = Ellipse.fromLongAxisAndEccentricity(axis, eccentricity)
assert ellipse.a == axis
assert ellipse.eccentricity == eccentricity

回答2:

Check that you have enough parameters
Calculate a from every pairing of the other parameters
Confirm every a is the same
Calculate b from every pairing of a and another parameter
Calculate the other parameters from a and b

Here's a shortened version with just a, b, e, and f that easily extends to other parameters:

class Ellipse():
    def __init__(self, a=None, b=None, e=None, f=None):
        if [a, b, e, f].count(None) > 2:
            raise Exception('Not enough parameters to make an ellipse')
        self.a, self.b, self.e, self.f = a, b, e, f
        self.calculate_a()
        for parameter in 'b', 'e', 'f':  # Allows any multi-character parameter names
            if self.__dict__[parameter] is None:
                Ellipse.__dict__['calculate_' + parameter](self)

    def calculate_a(self):
        """Calculate and compare a from every pair of other parameters

        :raises Exception: if the ellipse parameters are inconsistent
        """
        a_raw = 0 if self.a is None else self.a
        a_be = 0 if not all((self.b, self.e)) else self.b / math.sqrt(1 - self.e**2)
        a_bf = 0 if not all((self.b, self.f)) else math.sqrt(self.b**2 + self.f**2)
        a_ef = 0 if not all((self.e, self.f)) else self.f / self.e
        if len(set((a_raw, a_be, a_bf, a_ef)) - set((0,))) > 1:
            raise Exception('Inconsistent parameters')
        self.a = a_raw + a_be + a_bf + a_ef

    def calculate_b(self):
        """Calculate and compare b from every pair of a and another parameter"""
        b_ae = 0 if self.e is None else self.a * math.sqrt(1 - self.e**2)
        b_af = 0 if self.f is None else math.sqrt(self.a**2 - self.f**2)
        self.b = b_ae + b_af

    def calculate_e(self):
        """Calculate e from a and b"""
        self.e = math.sqrt(1 - (self.b / self.a)**2)

    def calculate_f(self):
        """Calculate f from a and b"""
        self.f = math.sqrt(self.a**2 - self.b**2)

It's pretty Pythonic, though the __dict__ usage might not be. The __dict__ way is fewer lines and less repetitive, but you can make it more explicit by breaking it out into separate if self.b is None: self.calculate_b() lines.

I only coded e and f, but it's extensible. Just mimic e and f code with the equations for whatever you want to add (area, circumference, etc.) as a function of a and b.

I didn't include your request for one-parameter Ellipses to become circles, but that's just a check at the beginning of calculate_a for whether there's only one parameter, in which case a should be set to make the ellipse a circle (b should be set if a is the only one):

def calculate_a(self):
    """..."""
    if [self.a, self.b, self.e, self.f].count(None) == 3:
        if self.a is None:
            # Set self.a to make a circle
        else:
            # Set self.b to make a circle
        return
    a_raw = ...

回答3:

If the need for such functionality is only for this single class, My advice would be to go with the second solution you have mentioned, using Nsh's answer.

Otherwise, if this problem arises in number of places in your project, here is a solution I came up with:

class YourClass(MutexInit):
    """First of all inherit the MutexInit class by..."""

    def __init__(self, **kwargs):
        """...calling its __init__ at the end of your own __init__. Then..."""
        super(YourClass, self).__init__(**kwargs)

    @sub_init
    def _init_foo_bar(self, foo, bar):
        """...just decorate each sub-init method with @sub_init"""
        self.baz = foo + bar

    @sub_init
    def _init_bar_baz(self, bar, baz):
        self.foo = bar - baz

This will make your code more readable, and you will hide the ugly details behind this decorators, which are self-explanatory.

Note: We could also eliminate the @sub_init decorator, however I think it is the only legal way to mark the method as sub-init. Otherwise, an option would be to agree on putting a prefix before the name of the method, say _init, but I think that's a bad idea.

Here are the implementations:

import inspect


class MutexInit(object):
    def __init__(self, **kwargs):
        super(MutexInit, self).__init__()

        for arg in kwargs:
            setattr(self, arg, kwargs.get(arg))

        self._arg_method_dict = {}
        for attr_name in dir(self):
            attr = getattr(self, attr_name)
            if getattr(attr, "_isrequiredargsmethod", False):
                self._arg_method_dict[attr.args] = attr

        provided_args = tuple(sorted(
            [arg for arg in kwargs if kwargs[arg] is not None]))
        sub_init = self._arg_method_dict.get(provided_args, None)

        if sub_init:
            sub_init(**kwargs)
        else:
            raise AttributeError('Insufficient arguments')


def sub_init(func):
    args = sorted(inspect.getargspec(func)[0])
    self_arg = 'self'
    if self_arg in args:
        args.remove(self_arg)

    def wrapper(funcself, **kwargs):
        if len(kwargs) == len(args):
            for arg in args:
                if (arg not in kwargs) or (kwargs[arg] is None):
                    raise AttributeError
        else:
            raise AttributeError

        return func(funcself, **kwargs)
    wrapper._isrequiredargsmethod = True
    wrapper.args = tuple(args)

    return wrapper

回答4:

Here's my try on it. If you're doing this for some end users, you might want to skip. What I did probably works well for setting up some fast math objects library, but only when the user knows what's going on.

Idea was that all variables describing a math object follow the same pattern, a=something*smntng.

So when calculating a variable irl, in the worst case I would be missing "something", then I'd go and calculate that value, and any values I'd be missing when calculating that one, and bring it back to finish calculating the original variable I was looking for. There's a certain recursion pattern noticeable.

When calculating a variable therefore, at each access of a variable I've got to check if it exists, and if it doesn't calculate it. Since it's at each access I have to use __getattribute__.

I also need a functional relationship between the variables. So I'll pin a class attribute relations which will serve just that purpose. It'll be a dict of variables and an appropriate function.

But I've also got to check in advance if I have all the necessary variables to calculate current one. so I'll amend my table, of centralized math relations between variables, to list all dependencies and before I go to calculate anything, I'll run over the listed dependencies and calc those if I need too.

So now it looks more like we'll have a ping pong match of semi-recursion where a function _calc will call __getattribute__ which calls function _calc again. Until such a time we run out of variables or we actually calculate something.

The Good:

There are no ifs
Can initialize with different init variables. As long as the sent variables enable calculations of others.
It's fairly generic and looks like it could work for any other mathematical object describable in a similar manner.
Once calculated all your variables will be remembered.

The Bad:

It's fairly "unpythonic" for whatever that word means to you (explicit is always better).
Not user friendly. Any error message you recieve will be as long as the number of times __getattribute__ and _calc called each other. Also no nice way of formulating a pretty error print.
You've a consistency issue at hand. This can probably be dealt with by overriding setters.
Depending on initial parameters, there is a possibility that you'll have to wait a long time to calculate a certain variable, especially if the requested variable calculation has to fall through several other calculations.
If you need a complex function, you have to make sure it's declared before relations which might make the code ugly (also see last point). I couldn't quite work out how to get them to be instance methods, and not class methods or some other more global functions because I basically overrided the . operator.
Circular functional dependencies are a concern as well. (a needs b which needs e which needs a again and into an infinite loop).
relations are set in a dict type. That means here's only 1 functional dependency you can have per variable name, which isn't necessarily true in mathematical terms.
It's already ugly: value = self.relations[var]["func"]( *[self.__getattribute__(x) for x in requirements["req"]] )

Also that's the line in _calc that calls __getattribute__ which either calls _calc again, or if the variable exists returns the value. Also at each __init__ you have to set all your attributes to None, because otherwise a _getattr will be called.

def cmplx_func_A(e, C):
    return 10*C*e

class Elipse():
    def __init__(self, a=None, b=None, **kwargs):
        self.relations = {
        "e": {"req":["a", "b"], "func": lambda a,b: a+b},
        "C": {"req":["e", "a"], "func": lambda e,a: e*1/(a*b)},
        "A": {"req":["C", "e"], "func": lambda e,C: cmplx_func_A(e, C)},
        "a": {"req":["e", "b"], "func": lambda e,b: e/b},
        "b": {"req":["e", "a"], "func": lambda e,a: e/a}
                   }
        self.a = a
        self.b = b
        self.e = None
        self.C = None
        self.A = None
        if kwargs:
            for key in kwargs:
                setattr(self, key, kwargs[key])

    def __getattribute__(self, attr):
        val = super(Elipse, self).__getattribute__(attr)
        if val: return val
        return self._calc(attr)

    def _calc(self, var):
        requirements = self.relations[var]
        value = self.relations[var]["func"](
            *[self.__getattribute__(x) for x in requirements["req"]]
            )
        setattr(self, var, value)
        return value

Oputput:

>>> a = Elipse(1,1)
>>> a.A #cal to calculate this will fall through
        #and calculate every variable A depends on (C and e)
20
>>> a.C #C is not calculated this time.
1 
>>> a = Elipse(1,1, e=3)
>>> a.e #without a __setattribute__ checking the validity, there is no 
3       #insurance that this makes sense.
>>> a.A #calculates this and a.C, but doesn't recalc a.e
30
>>> a.e
3
>>> a = Elipse(b=1, e=2) #init can be anything that makes sense
>>> a.a                  #as it's defined by relations dict.
2.0
>>> a = Elipse(a=2, e=2) 
>>> a.b
1.0

There is one more issue here, related to the next to last point in "the bad". I.e. let's imagine that we can can define an elipse with C and A. Because we can relate each variable with others over only 1 functional dependency, if you defined your variables a and b over e and a|b like I have, you won't be able to calculate them. There will always be at least some miniature subset of variables you will have to send. This can be alleviated by making sure you define as much of your variables over as little other variables you can but can't be avoided.

If you're lazy, this is a good way to short-circuit something you need done fast, but I wouldn't do this somewhere, where I expect someone else to use it, ever!

回答5:

For the bonus question it's probably sensible (depending on your use case) to calculate on request but remember the computed value if it's been computed before. E.g.

@property
def a(self):
    return self._calc_a()

def _calc_a(self):
    if self.a is None:
        self.a = ...?
    return self.a

回答6:

Included below is an approach which I've used before for partial data dependency and result caching. It actually resembles the answer @ljetibo provided with the following significant differences:

relationships are defined at the class level
work is done at definition time to permute them into a canonical reference for dependency sets and the target variables that may be calculated if they are available
calculated values are cached but there is no requirement that the instance be immutable since stored values may be invalidated (e.g. total transformation is possible)
Non-lambda based calculations of values giving some more flexibility

I've written it from scratch so there may be some things I've missed but it should cover the following adequately:

Define data dependencies and reject initialising data which is inadequate
Cache the results of calculations to avoid extra work
Returns a meaningful exception with the names of variables which are not derivable from the specified information

Of course this can be split into a base class to do the core work and a subclass which defines the basic relationships and calculations only. Splitting the logic for the extended relationship mapping out of the subclass might be an interesting problem though since the relationships must presumably be specified in the subclass.

Edit: it's important to note that this implementation does not reject inconsistent initialising data (e.g. specifying a, b, c and A such that it does not fulfil the mutual expressions for calculation). The assumption being that only the minimal set of meaningful data should be used by the instantiator. The requirement from the OP can be enforced without too much trouble via instantiation time evaluation of consistency between the provided kwargs.

import itertools


class Foo(object):
    # Define the base set of dependencies
    relationships = {
        ("a", "b", "c"): "A",
        ("c", "d"): "B",
    }

    # Forumulate inverse relationships from the base set
    # This is a little wasteful but gives cheap dependency set lookup at
    # runtime
    for deps, target in relationships.items():
        deps = set(deps)
        for dep in deps:
            alt_deps = deps ^ set([dep, target])
            relationships[tuple(alt_deps)] = dep

    def __init__(self, **kwargs):
        available = set(kwargs)
        derivable = set()
        # Run through the permutations of available variables to work out what
        # other variables are derivable given the dependency relationships
        # defined above
        while True:
            for r in range(1, len(available) + 1):
                for permutation in itertools.permutations(available, r):
                    if permutation in self.relationships:
                        derivable.add(self.relationships[permutation])
            if derivable.issubset(available):
                # If the derivable set adds nothing to what is already noted as
                # available, that's all we can get
                break
            else:
                available |= derivable

        # If any of the variables are underivable, raise an exception
        underivable = set(self.relationships.values()) - available
        if len(underivable) > 0:
            raise TypeError(
                "The following properties cannot be derived:\n\t{0}"
                .format(tuple(underivable))
            )
        # Store the kwargs in a mapping where we'll also cache other values as
        # are calculated
        self._value_dict = kwargs

    def __getattribute__(self, name):
        # Try to collect the value from the stored value mapping or fall back
        # to the method which calculates it below
        try:
            return super(Foo, self).__getattribute__("_value_dict")[name]
        except (AttributeError, KeyError):
            return super(Foo, self).__getattribute__(name)

    # This is left hidden but not treated as a staticmethod since it needs to
    # be run at definition time
    def __storable_property(getter):
        name = getter.__name__

        def storing_getter(inst):
            # Calculates the value using the defined getter and save it
            value = getter(inst)
            inst._value_dict[name] = value
            return value

        def setter(inst, value):
        # Changes the stored value and invalidate saved values which depend
        # on it
            inst._value_dict[name] = value
            for deps, target in inst.relationships.items():
                if name in deps and target in inst._value_dict:
                    delattr(inst, target)

        def deleter(inst):
            # Delete the stored value
            del inst._value_dict[name]

        # Pass back a property wrapping the get/set/deleters
        return property(storing_getter, setter, deleter, getter.__doc__)

    ## Each variable must have a single defined calculation to get its value
    ## Decorate these with the __storable_property function
    @__storable_property
    def a(self):
        return self.A - self.b - self.c

    @__storable_property
    def b(self):
        return self.A - self.a - self.c

    @__storable_property
    def c(self):
        return self.A - self.a - self.b

    @__storable_property
    def d(self):
        return self.B / self.c

    @__storable_property
    def A(self):
        return self.a + self.b + self.c

    @__storable_property
    def B(self):
        return self.c * self.d


if __name__ == "__main__":
    f = Foo(a=1, b=2, A=6, d=10)
    print f.a, f.A, f.B
    f.d = 20
    print f.B

回答7:

I would check for the consistency of the data each time you set a parameter.

import math
tol = 1e-9
class Ellipse(object):
    def __init__(self, a=None, b=None, A=None, a_b=None):
        self.a = self.b = self.A = self.a_b = None 
        self.set_short_axis(a)
        self.set_long_axis(b)
        self.set_area(A)
        self.set_maj_min_axis(a_b)

    def set_short_axis(self, a):
        self.a = a
        self.check()

    def set_long_axis(self, b):
        self.b = b
        self.check()

    def set_maj_min_axis(self, a_b):
        self.a_b = a_b
        self.check()

    def set_area(self, A):
        self.A = A
        self.check()

    def check(self):
        if self.a and self.b and self.A:
            if not math.fabs(self.A - self.a * self.b * math.pi) <= tol:
                raise Exception('A=a*b*pi does not check!')
        if self.a and self.b and self.a_b:
            if not math.fabs(self.a / float(self.b) - self.a_b) <= tol:
                raise Exception('a_b=a/b does not check!')

The main:

e1 = Ellipse(a=3, b=3, a_b=1)
e2 = Ellipse(a=3, b=3, A=27)

The first ellipse object is consistent; set_maj_min_axis(1) passes fine.

The second is not; set_area(27) fails, at least within the 1e-9 tolerance specified, and raises an error.

Edit 1

Some additional lines are needed for the cases when the uses supply a, a_b and A, in the check() method:

    if self.a and self.A and self.a_b:
        if not math.fabs(self.A - self.a **2 / self.a_b * math.pi) <= tol:
            raise Exception('A=a*a/a_b*pi does not check!')
    if self.b and self.A and self.a_b:
        if not math.fabs(self.A - self.b **2 * self.a_b * math.pi) <= tol:
            raise Exception('A=b*b*a_b*pi does not check!')

Main:

e3 = Ellipse(b=3.0, a_b=1.0, A=27)

An arguably wiser way would be to calculate self.b = self.a / float(self.a_b) directly into the set method of a_b. Since you decide yourself of the order of the set methods in the constructor, that might be more manageable than to write dozens of checks.