The main goals are as follows:
1) Apply StandardScaler
to continuous variables
2) Apply LabelEncoder
and OnehotEncoder
to categorical variables
The continuous variables need to be scaled, but at the same time, a couple of categorical variables are also of integer type. Applying StandardScaler
would result in undesired effects.
On the flip side, the StandardScaler
would scale the integer based categorical variables, which is also not we what.
Since continuous variables and categorical ones are mixed in a single Pandas
DataFrame, what's the recommended workflow to approach this kind of problem?
The best example to illustrate my point is the Kaggle Bike Sharing Demand dataset, where season
and weather
are integer categorical variables