trying this code:
from sklearn import linear_model
import numpy as np
x1 = np.arange(0,10,0.1)
x2 = x1*10
y = 2*x1 + 3*x2
X = np.vstack((x1, x2)).transpose()
reg_model = linear_model.LinearRegression()
reg_model.fit(X,y)
print reg_model.coef_
# should be [2,3]
print reg_model.predict([5,6])
# should be 2*5 + 3*6 = 28
print reg_model.intercept_
# perfectly at the expected value of 0
print reg_model.score(X,y)
# seems to be rather confident to be right
The results are
- [ 0.31683168 3.16831683]
- 20.5940594059
- 0.0
- 1.0
and therefore not what I expected - they are not the same as the parameters used to synthesize the data. Why is this so?
Your problem is with the uniqueness of solutions, as both dimensions are the same (applying a linear transform to one dimension does not make unique data in the eyes of this model), you get an infinite number of possible solutions that will fit you data. Applying a non-linear transformation to your second dimension you will see the desired output.
from sklearn import linear_model
import numpy as np
x1 = np.arange(0,10,0.1)
x2 = x1**2
X = np.vstack((x1, x2)).transpose()
y = 2*x1 + 3*x2
reg_model = linear_model.LinearRegression()
reg_model.fit(X,y)
print reg_model.coef_
# should be [2,3]
print reg_model.predict([[5,6]])
# should be 2*5 + 3*6 = 28
print reg_model.intercept_
# perfectly at the expected value of 0
print reg_model.score(X,y)
Outputs are
[ 2. 3.]
[ 28.]
-2.84217094304e-14
1.0