Creating sliding windows of NaN padded elements of

2019-02-19 11:37发布

问题:

I have a time series x[0], x[1], ... x[n-1], stored as a 1 dimensional numpy array. I would like to convert it to the following matrix:

NaN,        ... , NaN ,   x[0]
NaN,        ... , x[0],   x[1]
.
.
NaN,  x[0], ... , x[n-3],x[n-2]
x[0], x[1], ... , x[n-2],x[n-1]

I would like to use this matrix to speedup time-series calculations. Is there a function in numpy or scipy to do this? (I don't want to use for loop in python to do it)

回答1:

One approach with np.lib.stride_tricks.as_strided -

def nanpad_sliding2D(a):
    L = a.size
    a_ext = np.concatenate(( np.full(a.size-1,np.nan) ,a))
    n = a_ext.strides[0]
    strided = np.lib.stride_tricks.as_strided     
    return strided(a_ext, shape=(L,L), strides=(n,n))

Sample run -

In [41]: a
Out[41]: array([48, 82, 96, 34, 93, 25, 51, 26])

In [42]: nanpad_sliding2D(a)
Out[42]: 
array([[ nan,  nan,  nan,  nan,  nan,  nan,  nan,  48.],
       [ nan,  nan,  nan,  nan,  nan,  nan,  48.,  82.],
       [ nan,  nan,  nan,  nan,  nan,  48.,  82.,  96.],
       [ nan,  nan,  nan,  nan,  48.,  82.,  96.,  34.],
       [ nan,  nan,  nan,  48.,  82.,  96.,  34.,  93.],
       [ nan,  nan,  48.,  82.,  96.,  34.,  93.,  25.],
       [ nan,  48.,  82.,  96.,  34.,  93.,  25.,  51.],
       [ 48.,  82.,  96.,  34.,  93.,  25.,  51.,  26.]])

Memory efficiency with strides

As mentioned in the comments by @Eric, this strides based approach would be a memory efficient one as the output would be simply a view into the NaNs-padded 1D version. Let's test this out -

In [158]: a   # Sample 1D input
Out[158]: array([37, 95, 87, 10, 35])

In [159]: L = a.size  # Run the posted approach
     ...: a_ext = np.concatenate(( np.full(a.size-1,np.nan) ,a))
     ...: n = a_ext.strides[0]
     ...: strided = np.lib.stride_tricks.as_strided     
     ...: out = strided(a_ext, shape=(L,L), strides=(n,n))
     ...: 

In [160]: np.may_share_memory(a_ext,out) O/p might be a view into extended version
Out[160]: True

Let's confirm that the output is actually a view indeed by assigning values into a_ext and then checking out.

Initial values of a_ext and out :

In [161]: a_ext
Out[161]: array([ nan,  nan,  nan,  nan,  37.,  95.,  87.,  10.,  35.])

In [162]: out
Out[162]: 
array([[ nan,  nan,  nan,  nan,  37.],
       [ nan,  nan,  nan,  37.,  95.],
       [ nan,  nan,  37.,  95.,  87.],
       [ nan,  37.,  95.,  87.,  10.],
       [ 37.,  95.,  87.,  10.,  35.]])

Modify a_ext :

In [163]: a_ext[:] = 100

See the new out :

In [164]: out
Out[164]: 
array([[ 100.,  100.,  100.,  100.,  100.],
       [ 100.,  100.,  100.,  100.,  100.],
       [ 100.,  100.,  100.,  100.,  100.],
       [ 100.,  100.,  100.,  100.,  100.],
       [ 100.,  100.,  100.,  100.,  100.]])

Confirms that it's a view.

Finally, let's test out the memory requirements :

In [131]: a_ext.nbytes
Out[131]: 72

In [132]: out.nbytes
Out[132]: 200

So, the output even though it shows as 200 bytes is actually just 72 bytes because of being a view into the extended array that has a size of 72 bytes.


One more approach with Scipy's toeplitz -

from scipy.linalg import toeplitz

out = toeplitz(a, np.full(a.size,np.nan) )[:,::-1]