Numpy: Apply an array of functions to a same lengt

2020-07-06 08:56发布

问题:

I have numpy.arrays where the columns contain different data types, and the columns should also to have different functions applied to them. I have the functions in an array as well.

Let's say:

a = array([[ 1, 2.0, "three"],
           [ 4, 5.0, "six"  ]], dtype=object)

functions_arr = array([act_on_int, act_on_float, act_on_str])

I can certainly think of ways to do this by dividing the thing, but the one thing that seems most natural to me is to think of it as an elementwise multiplication with broadcasting, and the functions as operators. So I'd like to do something like

functions_arr*a

and get the effect of

array([[act_on_int(1), act_on_float(2.0), act_on_str("three")],
       [act_on_int(4), act_on_float(5.0), act_on_str("six")  ]])

Do you know of a way to achieve something along those lines?

Edit: I changed the definition of the array in the question to include dtype=[object] as people pointed out this is important for the array to store types the way I intended.

Thank you for your answers and comments! I have accepted senderles answer and feel this is very close to what I had in mind.

Since there seems to have been some confusion about how I consider the operation to be like multiplication, let me clarify that with another example:

As you're well aware, an operation like:

v = array([1,2,3])
u = array([[5,7,11],
           [13,17,19]])
v*u

will broadcast v over the rows of u and yields

array([[ 1*5, 2*7,  3*11],
       [1*13, 2*17, 3*19]])

i.e.

array([[ 5, 14, 33],
       [13, 34, 57]])

If we now were to replace v with for instance the del operator we would have (the following is not actually working python code:)

V = array([(d/dx),(d/dy),(d/dz)])
u = array([[5,7,11],
           [13,17,19]])
V*u

yielding (in spirit)

array([[(d/dx)5, (d/dy)7, (d/dz)11]],
       [(d/dx)13,(d/dy)17,(d/dz)19]])

I admit taking the derivative of a bunch of constants would not be the most interesting of operations, so feel free to replace u with some symbolic mathematical expression in x ,y and z. At any rate I hope this at least makes more clear both my reasoning and the bit about "(using a python function as an operator?)" in the title.

回答1:

As Sven Marnach reminded me, the array you've created is probably an array of Python objects. Any operation on them will likely be much slower than pure numpy operations. However, you can do what you've asked pretty easily, as long as you don't actually expect this to be very fast! It's not too different from what AFoglia suggested, but it's closer to being exactly what you asked for:

>>> a = numpy.array([[ 1, 2.0, "three"],
...                  [ 4, 5.0, "six"  ]], dtype=object)
>>> funcs = [lambda x: x + 10, lambda x: x / 2, lambda x: x + '!']
>>> apply_vectorized = numpy.vectorize(lambda f, x: f(x), otypes=[object])
>>> apply_vectorized(funcs, a)
array([[11, 1.0, three!],
       [14, 2.5, six!]], dtype=object)

Also echoing AFoglia here, there's a good chance you'd be better off using a record array -- this allows you to divide the array up as you like, and work with it in a more natural way using numpy ufuncs -- which are much faster than Python functions, generally:

rec.array([(1, 2.0, 'three'), (4, 5.0, 'six')], 
      dtype=[('int', '<i8'), ('float', '<f8'), ('str', '|S10')])
>>> a['int']
array([1, 4])
>>> a['float']
array([ 2.,  5.])
>>> a['str']
rec.array(['three', 'six'], 
      dtype='|S10')
>>> a['int'] += 10
>>> a['int']
array([11, 14])


回答2:

It's not broadcasting, because the original array only had one dimension. It looks like it has 2 dimensions because each element has three members (an int, a float, and a string), but to numpy, that's simply the type, and the number of dimensions is one.

Nor is it multiplication, because you are applying the function to each element. (It's no more multiplication than addition, so functions_arr * a is misleading syntax.)

Still, you can write something analogous to what you want. I'd try numpy.vectorize. Without testing it, and assuming the output dtype is the same as the original array. I imagine it would be like...

def act_on_row(row) :
    return (act_on_int(row["int_field"]),
            act_on_float(row["float_field"]),
            act_on_str(row["str_field"]))

act_on_array = numpy.vectorize(act_on_row, otypes=[a.dtype])

acted_on = act_on_array(a)

I've never tried vectorize, and I don't know if it's tricky to get working with structured dtypes, but this should get you started.

The simpler solution though would be to just loop over the array by field.

rslt = numpy.empty((len(a),), dtype=a.dtype)

rslt["int_field"] = act_on_int(a["int_field"])
rslt["float_field"] = act_on_float(a["float_field"])
rslt["str_field"] = act_on_str(a["str_field"])

(You might need to vectorize each individual function, depending on what they do.)



回答3:

you're looking for built-in function zip()

A simple example using lists:

>>> a=[[ 1, 2.0, "three"],[ 4, 5.0, "six"  ]]

>>> funcs=[lambda x:x**2,lambda y:y*2,lambda z:z.upper()]

>>> [[f(v) for v,f in zip(x,funcs)]for x in a]
[[1, 4.0, 'THREE'], [16, 10.0, 'SIX']]


标签: python numpy