Numpy append: Automatically cast an array of the w

is there a way to do the following without an if clause?

I'm reading a set of netcdf files with pupynere and want to build an array with numpy append. Sometimes the input data is multi-dimensional (see variable "a" below), sometimes one dimensional ("b"), but the number of elements in the first dimension is always the same ("9" in the example below).

> import numpy as np
> a = np.arange(27).reshape(3,9)
> b = np.arange(9)
> a.shape
(3, 9)
> b.shape
(9,)

this works as expected:

> np.append(a,a, axis=0)
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
   [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
   [18, 19, 20, 21, 22, 23, 24, 25, 26],
   [ 0,  1,  2,  3,  4,  5,  6,  7,  8],
   [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
   [18, 19, 20, 21, 22, 23, 24, 25, 26]])

but, appending b does not work so elegantly:

> np.append(a,b, axis=0)
ValueError: arrays must have same number of dimensions

The problem with append is (from the numpy manual)

"When axis is specified, values must have the correct shape."

I'd have to cast first in order to get the right result.

> np.append(a,b.reshape(1,9), axis=0)
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
   [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
   [18, 19, 20, 21, 22, 23, 24, 25, 26],
   [ 0,  1,  2,  3,  4,  5,  6,  7,  8]])

So, in my file reading loop, I'm currently using an if clause like this:

for i in [a, b]:
    if np.size(i.shape) == 2:
        result = np.append(result, i, axis=0)
    else:
        result = np.append(result, i.reshape(1,9), axis=0)

Is there a way to append "a" and "b" without the if statement?

EDIT: While @Sven answered the original question perfectly (using np.atleast_2d()), he (and others) pointed out that the code is inefficient. In an answer below, I combined their suggestions and replaces my original code. It should be much more efficient now. Thanks.

标签： python list performance numpy append

4条回答

趁早两清

2楼-- · 2019-02-17 15:17

You can just add all of the arrays to a list, then use np.vstack() to concatenate them all together at the end. This avoids constantly reallocating the growing array with every append.

|1> a = np.arange(27).reshape(3,9)

|2> b = np.arange(9)

|3> np.vstack([a,b])
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26],
       [ 0,  1,  2,  3,  4,  5,  6,  7,  8]])

0人赞添加讨论(0) 举报

戒情不戒烟

3楼-- · 2019-02-17 15:21

As pointed out, append needs to reallocate every numpy array. An alternative solution that allocates once would be something like this:

total_size = 0
for i in [a,b]:
    total_size += i.size

result = numpy.empty(total_size, dtype=a.dtype)
offset = 0
for i in [a,b]:
    # copy in the array
    result[offset:offset+i.size] = i.ravel()
    offset += i.size

# if you know its always divisible by 9:
result = result.reshape(result.size//9, 9)

If you can't precompute the array size, then perhaps you can put an upper bound on the size and then just preallocate a block that will always be big enough. Then you can just make the result a view into that block:

result = result[0:known_final_size]

0人赞添加讨论(0) 举报

仙女界的扛把子

4楼-- · 2019-02-17 15:28

I'm going to improve my code with the help of @Sven, @Henry and @Robert. @Sven answered the question, so he earns the reputation for this question, but - as highlighted by him and others -there is a more efficient way of doing what I want.

This involves using a python list, which allows appending with a performance penalty of O(1) whereas numpy.append() has a performance penalty of O(N**2). Afterwards, the list is converted to a numpy array:

Suppose i is either of type a or b:

> a = np.arange(27).reshape(3,9)
> b = np.arange(9)
> a.shape
(3, 9)
> b.shape
(9,)

Initialise list and append all read data, e.g. if data appear in order 'aaba'.

> mList = []
> for i in [a,a,b,a]:
     mList.append(i)

Your mList will look like this:

> mList
[array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
   [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
   [18, 19, 20, 21, 22, 23, 24, 25, 26]]),
 array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
   [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
   [18, 19, 20, 21, 22, 23, 24, 25, 26]]),
 array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
 array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
   [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
   [18, 19, 20, 21, 22, 23, 24, 25, 26]])]

finally, vstack the list to form a numpy array:

> result = np.vstack(mList[:])
> result.shape
(10, 9)

Thanks again for valuable help.

0人赞添加讨论(0) 举报

迷人小祖宗

5楼-- · 2019-02-17 15:35

You can use numpy.atleast_2d():

result = np.append(result, np.atleast_2d(i), axis=0)

That said, note that the repeated use of numpy.append() is a very inefficient way to build a NumPy array -- it has to be reallocated in every step. If at all possible, preallocate the array with the desired final size and populate it afterwards using slicing.

0人赞添加讨论(0) 举报

Numpy append: Automatically cast an array of the w

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间