Efficient way of appending / extracting list eleme

2019-09-11 15:06发布

问题:

Suppose I have the following list:

rays_all = [np.array(r11, r21, r31, r41),
            np.array(r12, r22, r32, r42),
            np.array(r13, r23, r33, r43),
            np.array(r14, r24, r34, r44)]

all the r11, r21, r31, etc are arrays with shape (3L,) (think of it as a vector in 3D space).

If I want to extract the (4L,3L) array of np.array(r14, r24, r34, r44), I just simply use rays_all[-1]. If I want to append a new array of np.array(r15, r25, r35, r45), I just use rays_all.append.

Now I arrange the above vectors (r11,r12, etc) in an alternative way:

ray1 = [r11, r12, r13, r14]
ray2 = [r21, r22]
ray3 = [r31, r32, r33]
ray4 = [r41, r42, r43, r44]

Each 'ray' now has its own list with different lengths. If I want to extract the last element of each list in an array structure, i.e. np.array([r14,r22,r33,r44]), what is the most efficient way to do so? On the other hand, if I want to add the elements in the array np.array([r15,r23,r34,r45]) to the list such that I will have

ray1 = [r11, r12, r13, r14, r15]
ray2 = [r21, r22, r23]
ray3 = [r31, r32, r33, r34]
ray4 = [r41, r42, r43, r44, r45]

what is the most efficient way? I know I can just a loop to do so, but I guess it is much slower than the rays_all[-1] and rays_append()? Are there any 'vectorized' way of doing this?

回答1:

Be careful with mixing array and list operations.

Make some 3 element arrays and combine them as in your first case:

In [748]: r1,r2,r3,r4=np.arange(3),np.ones(3),np.zeros(3),np.arange(3)[::-1]
In [749]: x1=np.array((r1,r2))
In [750]: x2=np.array((r3,r4))
In [751]: rays=[x1,x2]
In [752]: rays
Out[752]: 
[array([[ 0.,  1.,  2.],
        [ 1.,  1.,  1.]]), array([[ 0.,  0.,  0.],
        [ 2.,  1.,  0.]])]

rays is now a list contain two 2d array ((2,3) shape). As you say, you can select an item from that list or append another array to it (you can append anything to it, not just a similar array). Operations of rays are list operations.

You could also create a 3d array:

In [758]: ray_arr=np.array((x1,x2))
In [759]: ray_arr
Out[759]: 
array([[[ 0.,  1.,  2.],
        [ 1.,  1.,  1.]],

       [[ 0.,  0.,  0.],
        [ 2.,  1.,  0.]]])
In [760]: ray_arr.shape
Out[760]: (2, 2, 3)
In [761]: ray_arr[-1]
Out[761]: 
array([[ 0.,  0.,  0.],
       [ 2.,  1.,  0.]])

You can select from ray_arr as with the list. But appending requires creating a new array via np.concatenate (possibly hidden in the np.append function). No 'in-place' append as on a list.

Efficient selection of the last elements of all component arrays, by indexing on the last dimension.

In [762]: ray_arr[:,:,-1]
Out[762]: 
array([[ 2.,  1.],
       [ 0.,  0.]])

To get the corresponding values from the list rays you have to a list comprehension (or other loop):

In [765]: [r[:,-1] for r in rays]
Out[765]: [array([ 2.,  1.]), array([ 0.,  0.])]

There's no indexing shortcut as with arrays.

There are tools like zip (and others in itertools) that help you iterate through lists, and even rearrange values, e.g.

In [773]: list(zip(['a','b'],['c','d']))
Out[773]: [('a', 'c'), ('b', 'd')]
In [774]: list(zip(['a','b'],['c','d']))[-1]
Out[774]: ('b', 'd')

and with ragged sublists:

In [782]: list(zip(['a','b','c'],['d']))
Out[782]: [('a', 'd')]
In [783]: list(itertools.zip_longest(['a','b','c'],['d']))
Out[783]: [('a', 'd'), ('b', None), ('c', None)]

But I don't see how those will help with extracting values from your ray vectors.


Something worth exploring is to collect the base vectors into one 2d array, and use indexing to extra groups for various purposes,

In [867]: allrays=np.array([r1,r2,r3,r4])
In [868]: allrays
Out[868]: 
array([[ 0.,  1.,  2.],
       [ 1.,  1.,  1.],
       [ 0.,  0.,  0.],
       [ 2.,  1.,  0.]])

The 'z' coor for all rays

In [869]: allrays[:,-1]
Out[869]: array([ 2.,  1.,  0.,  0.])

One subset of rays (since it is a slice it is a view)

In [871]: allrays[0:2,:]
Out[871]: 
array([[ 0.,  1.,  2.],
       [ 1.,  1.,  1.]])

Another subset:

In [872]: allrays[2:,:]
Out[872]: 
array([[ 0.,  0.,  0.],
       [ 2.,  1.,  0.]])

3 item subset, selected with a list - this is a copy

In [873]: allrays[[0,1,2],:]
Out[873]: 
array([[ 0.,  1.,  2.],
       [ 1.,  1.,  1.],
       [ 0.,  0.,  0.]])
In [874]: allrays[[3],:]
Out[874]: array([[ 2.,  1.,  0.]])

several subsets obtained by indexing:

In [875]: ind=[[0,1,2],[3]]
In [876]: [allrays[i] for i in ind]
Out[876]: 
[array([[ 0.,  1.,  2.],
        [ 1.,  1.,  1.],
        [ 0.,  0.,  0.]]), 
 array([[ 2.,  1.,  0.]])]

If the groups are contiguous, you can use split:

In [884]: np.split(allrays,[3])
Out[884]: 
[array([[ 0.,  1.,  2.],
        [ 1.,  1.,  1.],
        [ 0.,  0.,  0.]]), array([[ 2.,  1.,  0.]])]

The subarrays are views (check with the .__array_interface__ property.

It does, in effect, just move the ragged list problem up a level. Still, there is more flexibility. You could construct other indexing sublists, e.g.

In [877]: ind1=[i[-1] for i in ind]   # last of all groups
In [878]: ind1
Out[878]: [2, 3]
In [879]: ind2=[i[0] for i in ind]   # first of all groups
In [880]: ind2
Out[880]: [0, 3]

You could concatenate some new values on to allrays. You may then have to rebuild the indexing lists. But I suspect this sort of building is done one, while access is repeated.


An earlier SO question about accessing values from the img produced by plt.pcolormesh (and plt.pcolor) comes to mind. One maintains an image as a surface on the 2d mesh, the other, more general, is just a collection of quadrilaterals, each with a color and path defining its boundary.



回答2:

In answer to your specific question, a list containing the last element of the four "ray" lists is, in general [ray1[-1],ray2[-1],ray3[-1],ray4[-1]].

Since your main concern here seems to be execution speed, I assume you have to perform this operation over and over again. Have you considered creating a little data structure that represents the last element, say, last_element = [r1x,r2x,r3x,r4x] and maintaining its value as you step through the problem? Each time you change last_element you append new data to the other lists as necessary. In other words, instead of repeatedly extracting the last element from the big lists, build the big lists step-by-step from the last element. That would have to be more efficient as long as you've got to build those big lists anyway. Would it work for your problem?