Making a vectorized numpy function behave like a u

2019-07-13 05:17发布

问题:

Let's suppose that we have a Python function that takes in Numpy arrays and returns another array:

import numpy as np

def f(x, y, method='p'):
    """Parameters:  x (np.ndarray) , y (np.ndarray), method (str)
    Returns: np.ndarray"""
    z = x.copy()    
    if method == 'p':
        mask = x < 0
    else:
        mask = x > 0
    z[mask] = 0
    return z*y

although the actual implementation does not matter. We can assume that x and y will always be arrays of the same shape, and that the output is of the same shape as x.

The question is what would be the simplest/most elegant way of wrapping such function so it would work with both ND arrays (N>1) and scalar arguments, in a manner somewhat similar to universal functions in Numpy.

For instance, the expected output for the above function should be,

In [1]: f_ufunc(np.arange(-1,2), np.ones(3), method='p') 
Out[1]: array([ 0.,  0.,  1.]) # random array input -> output of the same shape

In [2]: f_ufunc(np.array([1]), np.array([1]), method='p') 
Out[2]: array([1])   # array input of len 1 -> output of len 1

In [3]: f_ufunc(1, 1, method='p')
Out[3]: 1  # scalar input -> scalar output
  • The function f cannot be changed, and it will fail if given a scalar argument for x or y.

  • When x and y are scalars, we transform them to 1D arrays, do the calculation then transform them back to scalars at the end.

  • f is optimized to work with arrays, scalar input being mostly a convenience. So writing a function that work with scalars and then using np.vectorize or np.frompyfunc would not be acceptable.

A beginning of an implementation could be,

def atleast_1d_inverse(res, x):
    # this function fails in some cases (see point 1 below).
    if res.shape[0] == 1:
        return res[0]
    else:
        return res

def ufunc_wrapper(func, args=[]):
    """ func:  the wrapped function
        args:  arguments of func to which we apply np.atleast_1d """

    # this needs to be generated dynamically depending on the definition of func
    def wrapper(x, y, method='p'):
        # we apply np.atleast_1d to the variables given in args
        x = np.atleast_1d(x)
        y = np.atleast_1d(x)

        res = func(x, y, method='p')

        return atleast_1d_inverse(res, x)

    return wrapper

f_ufunc = ufunc_wrapper(f, args=['x', 'y'])

which mostly works, but will fail the tests 2 above, producing a scalar output instead of a vector one. If we want to fix that, we would need to add more tests on the type of the input (e.g. isinstance(x, np.ndarray), x.ndim>0, etc), but I'm afraid to forget some corner cases there. Furthermore, the above implementation is not generic enough to wrap a function with a different number of arguments (see point 2 below).

This seems to be a rather common problem, when working with Cython / f2py function, and I was wondering if there was a generic solution for this somewhere?

Edit: a bit more precisions following @hpaulj's comments. Essentially, I'm looking for

  1. a function that would be the inverse of np.atleast_1d, such as atleast_1d_inverse( np.atleast_1d(x), x) == x, where the second argument is only used to determine the type or the number of dimensions of the original object x. Returning numpy scalars (i.e. arrays with ndim = 0) instead of a python scalar is ok.

  2. A way to inspect the function f and generate a wrapper that is consistent with its definition. For instance, such wrapper could be used as,

    f_ufunc = ufunc_wrapper(f, args=['x', 'y'])

    and then if we have a different function def f2(x, option=2): return x**2, we could also use

    f2_ufunc = ufunc_wrapper(f2, args=['x']).

Note: the analogy with ufuncs might be a bit limited, as this corresponds to the opposite problem. Instead of having a scalar function that we transform to accept both vector and scalar input, I have a function designed to work with vectors (that can be seen as something that was previously vectorized), that I would like to accept scalars again, without changing the original function.

回答1:

This doesn't fully answer the question of making a vectorized function truly behave like a ufunc, but I did recently run into a slight annoyance with numpy.vectorize that sounds similar to your issue. That wrapper insists on returning an array (with ndim=0 and shape=()) even if passed scalar inputs.

But it appears that the following does the right thing. In this case I am vectorizing a simple function to return a floating point value to a certain number of significant digits.

def signif(x, digits):
    return round(x, digits - int(np.floor(np.log10(abs(x)))) - 1)

def vectorize(f):
    vf = np.vectorize(f)

    def newfunc(*args, **kwargs):
        return vf(*args, **kwargs)[()]
    return newfunc

vsignif = vectorize(signif)

This gives

>>> vsignif(0.123123, 2)
0.12
>>> vsignif([[0.123123, 123.2]], 2)
array([[   0.12,  120.  ]])
>>> vsignif([[0.123123, 123.2]], [2, 1])
array([[   0.12,  100.  ]])