I have the following df:
Date Event_Counts Category_A Category_B
20170401 982457 0 1
20170402 982754 1 0
20170402 875786 0 1
I am preparing the data for a regression analysis and want to standardize the column Event_Counts, so that it's on a similar scale like the categories.
I use the following code:
from sklearn import preprocessing
df['scaled_event_counts'] = preprocessing.scale(df['Event_Counts'])
While I do get this warning:
DataConversionWarning: Data with input dtype int64 was converted to float64 by the scale function.
warnings.warn(msg, _DataConversionWarning)
it seems to have worked; there is a new column. However, it has negative numbers like -1.3
What I thought the scale function does is subtract the mean from the number and divide it by the standard deviation for every row; then add the min of the result to every row.
Does it not work for pandas that way? Or should I use the normalize() function or StandardScaler() function? I wanted to have the standardize column on a scale of 0 to 1.
Thank You