Apply function to an array of tuples

2019-06-06 17:37发布

问题:

I have a function that I would like to apply to an array of tuples and I am wondering if there is a clean way to do it.

Normally, I could use np.vectorize to apply the function to each item in the array, however, in this case "each item" is a tuple so numpy interprets the array as a 3d array and applies the function to each item within the tuple.

So I can assume that the incoming array is one of:

  1. tuple
  2. 1 dimensional array of tuples
  3. 2 dimensional array of tuples

I can probably write some looping logic but it seems like numpy most likely has something that does this more efficiently and I don't want to reinvent the wheel.

This is an example. I am trying to apply the tuple_converter function to each tuple in the array.

array_of_tuples1 = np.array([
        [(1,2,3),(2,3,4),(5,6,7)],
        [(7,2,3),(2,6,4),(5,6,6)],
        [(8,2,3),(2,5,4),(7,6,7)],
    ])

array_of_tuples2 = np.array([
        (1,2,3),(2,3,4),(5,6,7),
    ])

plain_tuple = (1,2,3)



# Convert each set of tuples
def tuple_converter(tup):
    return tup[0]**2 + tup[1] + tup[2]

# Vectorizing applies the formula to each integer rather than each tuple
tuple_converter_vectorized = np.vectorize(tuple_converter)

print(tuple_converter_vectorized(array_of_tuples1))
print(tuple_converter_vectorized(array_of_tuples2))
print(tuple_converter_vectorized(plain_tuple))

Desired Output for array_of_tuples1:

[[ 6 11 38]
 [54 14 37]
 [69 13 62]]

Desired Output for array_of_tuples2:

[ 6 11 38]

Desired Output for plain_tuple:

6

But the code above produces this error (because it is trying to apply the function to an integer rather than a tuple.)

<ipython-input-209-fdf78c6f4b13> in tuple_converter(tup)
     10 
     11 def tuple_converter(tup):
---> 12     return tup[0]**2 + tup[1] + tup[2]
     13 
     14 

IndexError: invalid index to scalar variable.

回答1:

array_of_tuples1 and array_of_tuples2 are not actually arrays of tuples, but just 3- and 2-dimensional arrays of integers:

In [1]: array_of_tuples1 = np.array([
   ...:         [(1,2,3),(2,3,4),(5,6,7)],
   ...:         [(7,2,3),(2,6,4),(5,6,6)],
   ...:         [(8,2,3),(2,5,4),(7,6,7)],
   ...:     ])

In [2]: array_of_tuples1
Out[2]: 
array([[[1, 2, 3],
        [2, 3, 4],
        [5, 6, 7]],

       [[7, 2, 3],
        [2, 6, 4],
        [5, 6, 6]],

       [[8, 2, 3],
        [2, 5, 4],
        [7, 6, 7]]])

So, instead of vectorizing your function, because it then will basically for-loop through the elements of the array (integers), you should apply it on the suitable axis (the axis of the "tuples") and not care about the type of the sequence:

In [6]: np.apply_along_axis(tuple_converter, 2, array_of_tuples1)
Out[6]: 
array([[ 6, 11, 38],
       [54, 14, 37],
       [69, 13, 62]])

In [9]: np.apply_along_axis(tuple_converter, 1, array_of_tuples2)
Out[9]: array([ 6, 11, 38])


回答2:

The other answer above is certainly correct, and probably what you're looking for. But I noticed you put the word "clean" into your question, and so I'd like to add this answer as well.

If we can make the assumption that all the tuples are 3 element tuples (or that they have some constant number of elements), then there's a nice little trick you can do so that the same piece of code will work on any single tuple, 1d array of tuples, or 2d array of tuples without an if/else for the 1d/2d cases. I'd argue that avoiding switches is always cleaner (although I suppose this could be contested).

import numpy as np

def map_to_tuples(x):
    x = np.array(x)
    flattened = x.flatten().reshape(-1, 3)
    return np.array([tup[0]**2 + tup[1] + tup[2] for tup in flattened]).reshape(x.shape[:-1])

Outputs the following for your inputs (respectively), as desired:

[[ 6 11 38]
 [54 14 37]
 [69 13 62]]

[ 6 11 38]

6


回答3:

If you are serious about the tuples bit, you could define a structured dtype.

In [535]: dt=np.dtype('int,int,int')

In [536]: x1 = np.array([
        [(1,2,3),(2,3,4),(5,6,7)],
        [(7,2,3),(2,6,4),(5,6,6)],
        [(8,2,3),(2,5,4),(7,6,7)],
    ], dtype=dt)

In [537]: x1
Out[537]: 
array([[(1, 2, 3), (2, 3, 4), (5, 6, 7)],
       [(7, 2, 3), (2, 6, 4), (5, 6, 6)],
       [(8, 2, 3), (2, 5, 4), (7, 6, 7)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])

Note that the display uses tuples. x1 is a 3x3 array of type dt. The elements, or records, are displayed as tuples. This more useful if the tuple elements differ - float, integer, string etc.

Now define a function that works with fields of such an array:

In [538]: def foo(tup):
    return tup['f0']**2 + tup['f1'] + tup['f2']

It applies neatly to x1.

In [539]: foo(x1)
Out[539]: 
array([[ 6, 11, 38],
       [54, 14, 37],
       [69, 13, 62]])

It also applies to a 1d array of the same dtype.

In [540]: x2=np.array([(1,2,3),(2,3,4),(5,6,7) ],dtype=dt)

In [541]: foo(x2)
Out[541]: array([ 6, 11, 38])

And a 0d array of matching type:

In [542]: foo(np.array(plain_tuple,dtype=dt))
Out[542]: 6

But foo(plain_tuple) won't work, since the function is written to work with named fields, not indexed ones.

The function could be modified to cast the input to the correct dtype if needed:

In [545]: def foo1(tup):
    temp = np.asarray(tup, dtype=dt)
   .....:     return temp['f0']**2 + temp['f1'] + temp['f2']

In [548]: plain_tuple
Out[548]: (1, 2, 3)

In [549]: foo1(plain_tuple) 
Out[549]: 6

In [554]: foo1([(1,2,3),(2,3,4),(5,6,7)])  # list of tuples
Out[554]: array([ 6, 11, 38])