Normalize data before or after split of training a

2019-03-19 03:59发布

I want to separate my data into train and test set, should I apply normalization over data before or after the split? Does it make any difference while building predictive model? Thanks in advance.

标签： machine-learning split regression normalization train-test-split

2条回答

forever°为你锁心

2楼-- · 2019-03-19 04:36

you can use fit then transform learn

normalizer = preprocessing.Normalizer().fit(xtrain)

transform

xtrainnorm = normalizer.transform(xtrain) 
xtestnorm = normalizer.transform(Xtest)

0人赞添加讨论(0) 举报

看我几分像从前

3楼-- · 2019-03-19 04:42

You first need to split the data into training and test set (validation set might also be required).

Don't forget that testing data points represent real-world data. Feature normalization (or data standardization) of the explanatory (or predictor) variables is a technique used to center and normalise the data by subtracting the mean and dividing by the variance. If you take the mean and variance of the whole dataset you'll be introducing future information into the training explanatory variables (i.e. the mean and variance).

Therefore, you should perform feature normalisation over the training data. Then perform normalisation on testing instances as well, but this time using the mean and variance of training explanatory variables. In this way, we can test and evaluate whether our model can generalize well to new, unseen data points.

0人赞添加讨论(0) 举报

Normalize data before or after split of training a

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间