I am new to the concept of scaling a feature in Machine Learning, I read that scaling will be useful when one feature range is very high when compared to other features. But if I choose to scale the training data then:
- Can I just scale that one feature that has high range?
- If I scale the entire
X
of train data then do I need to also scale the y
of train data and entire test data?
Yes, you can scale a single feature. You can interpret scaling as a means of giving the same importance to each feature. For instance, imagine you have data about people and you describe your examples via two features: height and weight. If you measure height in meters and weight in kilograms, a k-Nearest Neighbours classifier when computing the distance between two examples is likely to make its decisions solely based on the weight. In that case, you can scale one of the features to the same range of the other. Commonly, we scale all the features to the same range (e.g. 0 - 1). In addition, remember that all the values you use to scale your training data must be used to scale the test data.
As for the dependent variable y
you do not need to scale it.