Calculating Slopes in Numpy (or Scipy)

2020-05-31 17:04发布

I am trying to find the fastest and most efficient way to calculate slopes using Numpy and Scipy. I have a data set of three Y variables and one X variable and I need to calculate their individual slopes. For example, I can easily do this one row at a time, as shown below, but I was hoping there was a more efficient way of doing this. I also don't think linregress is the best way to go because I don't need any of the auxiliary variables like intercept, standard error, etc in my results. Any help is greatly appreciated.

    import numpy as np
    from scipy import stats

    Y = [[  2.62710000e+11   3.14454000e+11   3.63609000e+11   4.03196000e+11
        4.21725000e+11   2.86698000e+11   3.32909000e+11   4.01480000e+11
        4.21215000e+11   4.81202000e+11]
        [  3.11612352e+03   3.65968334e+03   4.15442691e+03   4.52470938e+03
        4.65011423e+03   3.10707392e+03   3.54692896e+03   4.20656404e+03
        4.34233412e+03   4.88462501e+03]
        [  2.21536396e+01   2.59098311e+01   2.97401268e+01   3.04784552e+01
        3.13667639e+01   2.76377113e+01   3.27846013e+01   3.73223417e+01
        3.51249997e+01   4.42563658e+01]]
    X = [ 1990.  1991.  1992.  1993.  1994.  1995.  1996.  1997.  1998.  1999.] 
    slope_0, intercept, r_value, p_value, std_err = stats.linregress(X, Y[0,:])
    slope_1, intercept, r_value, p_value, std_err = stats.linregress(X, Y[1,:])
    slope_2, intercept, r_value, p_value, std_err = stats.linregress(X, Y[2,:])
    slope_0 = slope/Y[0,:][0]
    slope_1 = slope/Y[1,:][0]
    slope_2 = slope/Y[2,:][0]
    b, a = polyfit(X, Y[1,:], 1)
    slope_1_a = b/Y[1,:][0]

8条回答
趁早两清
2楼-- · 2020-05-31 17:14

I built upon the other answers and the original regression formula to build a function which works for any tensor. It will calculate the slopes of the data along the given axis. So, if you have arbitrary tensors X[i,j,k,l], Y[i,j,k,l] and you want to know the slopes for all other axes along the data in the third axis, you can call it with calcSlopes( X, Y, axis = 2 ).

import numpy as np

def calcSlopes( x = None, y = None, axis = -1 ):
    assert x is not None or y is not None

    # assume that the given single data argument are equally
    # spaced y-values (like in numpy plot command)
    if y is None:
        y = x
        x = None

    # move axis we wanna calc the slopes of to first
    # as is necessary for subtraction of the means
    # note that the axis 'vanishes' anyways, so we don't need to swap it back
    y = np.swapaxes( y, axis, 0 )
    if x is not None:
        x = np.swapaxes( x, axis, 0 )

    # https://en.wikipedia.org/wiki/Simple_linear_regression
    # beta = sum_i ( X_i - <X> ) ( Y_i - <Y> ) / ( sum_i ( X_i - <X> )^2 )
    if x is None:
        # axis with values to reduce must be trailing for broadcast_to,
        # therefore transpose
        x = np.broadcast_to( np.arange( y.shape[0] ), y.T.shape ).T
        x = x - ( x.shape[0] - 1 ) / 2. # mean of (0,1,...,n-1) is n*(n-1)/2/n
    else:
        x = x - np.mean( x, axis = 0 )
    y = y - np.mean( y, axis = 0 )

    # beta = sum_i x_i y_i / sum_i x_i*^2
    slopes = np.sum( np.multiply( x, y ), axis = 0 ) / np.sum( x**2, axis = 0 )

    return slopes

It also has the gimmick to work with only equally spaced y data being given. So for example:

y = np.array( [
    [ 1, 2, 3, 4 ],
    [ 2, 4, 6, 8 ]
] )

print( calcSlopes( y, axis = 0 ) )
print( calcSlopes( y, axis = 1 ) )

x = np.array( [
    [ 0, 2, 4, 6 ],
    [ 0, 4, 8, 12 ]
] )

print( calcSlopes( x, y, axis = 1 ) )

Output:

[1. 2. 3. 4.]
[1. 2.]
[0.5 0.5]
查看更多
该账号已被封号
3楼-- · 2020-05-31 17:16

The fastest and the most efficient way would be to use a native scipy function from linregress which calculates everything:

slope : slope of the regression line

intercept : intercept of the regression line

r-value : correlation coefficient

p-value : two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero

stderr : Standard error of the estimate

And here is an example:

a = [15, 12, 8, 8, 7, 7, 7, 6, 5, 3]
b = [10, 25, 17, 11, 13, 17, 20, 13, 9, 15]
from scipy.stats import linregress
linregress(a, b)

will return you:

LinregressResult(slope=0.20833333333333337, intercept=13.375, rvalue=0.14499815458068521, pvalue=0.68940144811669501, stderr=0.50261704627083648)

P.S. Just a mathematical formula for slope:

enter image description here

查看更多
▲ chillily
4楼-- · 2020-05-31 17:19

With X and Y defined the same way as in your question, you can use:

dY = (numpy.roll(Y, -1, axis=1) - Y)[:,:-1]
dX = (numpy.roll(X, -1, axis=0) - X)[:-1]

slopes = dY/dX

numpy.roll() helps you align the next observation with the current one, you just need to remove the last column which is the not useful difference between the last and first observations. Then you can calculate all slopes at once, without scipy.

In your example, dX is always 1, so you can save more time by computing slopes = dY.

查看更多
我命由我不由天
5楼-- · 2020-05-31 17:20

A representation that's simpler than the accepted answer:

x = np.linspace(0, 10, 11)
y = np.linspace(0, 20, 11)
y = np.c_[y, y,y]

X = x - x.mean()
Y = y - y.mean()

slope = (X.dot(Y)) / (X.dot(X))

The equation for the slope comes from Vector notation for the slope of a line using simple regression.

查看更多
地球回转人心会变
6楼-- · 2020-05-31 17:21

As said before, you can use scipy's linregress. Here is how to get just the slope out:

    from scipy.stats import linregress

    x=[1,2,3,4,5]
    y=[2,3,8,9,22]

    slope, intercept, r_value, p_value, std_err = linregress(x, y)
    print(slope)

Keep in mind that doing it this way, since you are computing extra values like r_value and p_value, will take longer than calculating only the slope manually. However, Linregress is pretty quick.

Source:https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html

查看更多
三岁会撩人
7楼-- · 2020-05-31 17:24

The way I did it is using the np.diff() function:

dx = np.diff(xvals),

dy = np.diff(yvals)

slopes = dy/dx

查看更多
登录 后发表回答