Linear Regression Returns Different Results Than S

2019-09-20 06:18发布

问题:

trying this code:

from sklearn import linear_model
import numpy as np

x1 = np.arange(0,10,0.1)
x2 = x1*10

y = 2*x1 + 3*x2
X = np.vstack((x1, x2)).transpose()

reg_model = linear_model.LinearRegression()
reg_model.fit(X,y)

print reg_model.coef_
# should be [2,3]

print reg_model.predict([5,6])
# should be 2*5 + 3*6 = 28 

print reg_model.intercept_
# perfectly at the expected value of 0

print reg_model.score(X,y)
# seems to be rather confident to be right

The results are

  • [ 0.31683168 3.16831683]
  • 20.5940594059
  • 0.0
  • 1.0

and therefore not what I expected - they are not the same as the parameters used to synthesize the data. Why is this so?

回答1:

Your problem is with the uniqueness of solutions, as both dimensions are the same (applying a linear transform to one dimension does not make unique data in the eyes of this model), you get an infinite number of possible solutions that will fit you data. Applying a non-linear transformation to your second dimension you will see the desired output.

from sklearn import linear_model
import numpy as np

x1 = np.arange(0,10,0.1)
x2 = x1**2
X = np.vstack((x1, x2)).transpose()
y = 2*x1 + 3*x2

reg_model = linear_model.LinearRegression()
reg_model.fit(X,y)
print reg_model.coef_
# should be [2,3]

print reg_model.predict([[5,6]])
# should be 2*5 + 3*6 = 28 

print reg_model.intercept_
# perfectly at the expected value of 0

print reg_model.score(X,y)

Outputs are

  • [ 2. 3.]
  • [ 28.]
  • -2.84217094304e-14
  • 1.0