Using PCA on linear regression

2019-05-11 19:41发布

I want to use principal component analysis to reduce some noise before applying linear regression.

I have 1000 samples and 200 features

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.decomposition import PCA

X = np.random.rand(1000,200)
y = np.random.rand(1000,1)

With this data I can train my model:

model.fit(X,y)

But if I try the same after applying PCA

pca = PCA(n_components=8)
pca.fit(X)
PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)
principal_components =  pca.components_

model.fit(principal_components,y)

I get this error:

ValueError: Found input variables with inconsistent numbers of samples: [8, 1000]

标签： python machine-learning scikit-learn pca

1条回答

冷血范

2楼-- · 2019-05-11 20:05

Try this:

pca = PCA(n_components=8)
X_pca = pca.fit_transform(X)

model.fit(X_pca,y)

That is, you simultaneously fit PCA to X and transform it into (1000, 8) array named X_pca. That's what you should use instead of the pca.components_

0人赞添加讨论(0) 举报

Using PCA on linear regression

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间