可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have the following df:

Date       Event_Counts   Category_A  Category_B
20170401      982457          0           1
20170402      982754          1           0
20170402      875786          0           1

I am preparing the data for a regression analysis and want to standardize the column Event_Counts, so that it's on a similar scale like the categories.

I use the following code:

from sklearn import preprocessing
df['scaled_event_counts'] = preprocessing.scale(df['Event_Counts'])

While I do get this warning:

DataConversionWarning: Data with input dtype int64 was converted to float64 by the scale function.
  warnings.warn(msg, _DataConversionWarning)

it seems to have worked; there is a new column. However, it has negative numbers like -1.3

What I thought the scale function does is subtract the mean from the number and divide it by the standard deviation for every row; then add the min of the result to every row.

Does it not work for pandas that way? Or should I use the normalize() function or StandardScaler() function? I wanted to have the standardize column on a scale of 0 to 1.

Thank You

回答1:

I think you are looking for the sklearn.preprocessing.MinMaxScaler. That will allow you to scale to a given range.

So in your case it would be:

scaler = preprocessing.MinMaxScaler(feature_range=(0,1))
df['scaled_event_counts'] = scaler.fit_transform(df['Event_Counts'])

To scale the entire df:

scaled_df = scaler.fit_transform(df)
print(scaled_df)
[[ 0.          0.99722347  0.          1.        ]
 [ 1.          1.          1.          0.        ]
 [ 1.          0.          0.          1.        ]]

回答2:

Scaling is done by subtracting the mean and dividing by the standard deviation of each feature (column). So,

scaled_event_counts = (Event_Counts - mean(Event_Counts)) / std(Event_Counts)

The int64 to float64 warning comes from having to subtract the mean, which would be a floating point number, and not just an integer.

You will have negative numbers with the scaled column because the mean will be normalized to zero.

python pandas standardize column for regression

问题:

回答1:

回答2:

收藏的人(0)

python pandas standardize column for regression

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮