Should I normalize training and test test separate

I want to normalize my data in the range [0,1]. Should I normalize data after shuffling and splitting?Should I repeat the same procedure for test test? I came across a python code which was using such type of normalization. Is this the correct way to normalize data with target range [0,1]

`X_train = np.array([[ 1., -1.,  2.], [ 2.,  0.,  0.],[ 0.,  1., -1.]])
a= X_train
for i in range(3):
    old_range = np.amax(a[:,i]) - np.amin(a[:,i])
    new_range = 1 - 0
    f = ((a[:,i] - np.amin(a[:,i])) / old_range)*new_range + 0
    lis.append(f)
b = np.transpose(np.array(lis))
print(b)`

Here is my result after normalization.

`[[0.5, 0., 1.]
[1., 0.5, 0.33333333]
[0., 1., 0.]]`

标签： python machine-learning

1条回答

男人必须洒脱

2楼-- · 2019-07-13 12:21

Should I normalize data after shuffling and splitting?

Yes. Otherwise, you are leaking information from the future (i.e., test here). More information here; it is for standardization, and not normalization, (and R, not Python) but the arguments are equally applicable.

Should I repeat the same procedure for test?

Yes. Using the scaler that was fitted to the training dataset. In this case, it means using the max and min from the training dataset for scaling the test dataset. This ensures consistency with the transformation performed on the training data and makes it possible to evaluate if the model can generalize well.

You do not have to code it from scratch. Using sklearn:

import numpy as np
from sklearn import preprocessing

X_train = np.array([[ 1., -1.,  2.], [ 2.,  0.,  0.],[ 0.,  1., -1.]])
X_test = np.array([[ 0, -1.,  1.5], [ 2.5,  0.,  1]])

scaler = preprocessing.MinMaxScaler()
scaler = scaler.fit(X_train)

X_train_minmax = scaler.transform(X_train)
X_test_minmax = scaler.transform(X_test)

Note: for most applications, standardization is the recommended approach for scaling preprocessing.StandardScaler()

0人赞添加讨论(0) 举报

Should I normalize training and test test separate

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间