I want to normalize my data in the range [0,1]. Should I normalize data after shuffling and splitting?Should I repeat the same procedure for test test? I came across a python code which was using such type of normalization. Is this the correct way to normalize data with target range [0,1]
`X_train = np.array([[ 1., -1., 2.], [ 2., 0., 0.],[ 0., 1., -1.]])
a= X_train
for i in range(3):
old_range = np.amax(a[:,i]) - np.amin(a[:,i])
new_range = 1 - 0
f = ((a[:,i] - np.amin(a[:,i])) / old_range)*new_range + 0
lis.append(f)
b = np.transpose(np.array(lis))
print(b)`
Here is my result after normalization.
`[[0.5, 0., 1.]
[1., 0.5, 0.33333333]
[0., 1., 0.]]`
Yes. Otherwise, you are leaking information from the future (i.e., test here). More information here; it is for standardization, and not normalization, (and R, not Python) but the arguments are equally applicable.
Yes. Using the scaler that was fitted to the training dataset. In this case, it means using the max and min from the training dataset for scaling the test dataset. This ensures consistency with the transformation performed on the training data and makes it possible to evaluate if the model can generalize well.
You do not have to code it from scratch. Using sklearn:
Note: for most applications, standardization is the recommended approach for scaling
preprocessing.StandardScaler()