Regression along a dimension in a numpy array

2020-07-09 08:50发布

问题:

I've got a 4-dimensional numpy array (x,y,z,time) and would like to do a numpy.polyfit through the time dimension, at each x,y,z coordinate. For example:

import numpy as np
n = 10       # size of my x,y,z dimensions
degree = 2   # degree of my polyfit
time_len = 5 # number of time samples

# Make some data
A = np.random.rand(n*n*n*time_len).reshape(n,n,n,time_len)

# An x vector to regress through evenly spaced samples
X = np.arange( time_len )

# A placeholder for the regressions
regressions = np.zeros(n*n*n*(degree+1)).reshape(n,n,n,degree+1)

# Loop over each index in the array (slow!)
for row in range(A.shape[0] ) :
    for col in range(A.shape[1] ) :
        for slice in range(A.shape[2] ):
            fit = np.polyfit( X, A[row,col,slice,:], degree )
            regressions[row,col,slice] = fit

I'd like to get to the regressions array without having to go through all of the looping. Is this possible?

回答1:

Reshape your data such that each individual slice is on a column of a 2d array. Then run polyfit once.

A2 = A.reshape(time_len, -1)
regressions = np.polyfit(X, A2, degree)
regressions = regressions.reshape(A.shape)

Or something like that ... I don't really understand what all of the dimensions correspond to in your dataset, so I'm not sure exactly what shape you will want. But the point is, each individual dataset for polyfit should occupy a column in the matrix A2.

By the way, if you are interested in performance, then you should profile your code using the profile module or something like that. Generally speaking, you can't always predict how quickly code will run just by eyeballing it. You have to run it. Although in this case removing the loops will also make your code 100x more readable, which is even more important.



标签: python numpy