OneHotEncoder categorical_features depreciated, ho

I need to transform the independent field from string to arithmetical notation. I am using OneHotEncoder for the transformation. My dataset has many independent columns of which some are as:

Country     |    Age       
--------------------------
Germany     |    23
Spain       |    25
Germany     |    24
Italy       |    30

I have to encode the Country column like

0     |    1     |     2     |       3
--------------------------------------
1     |    0     |     0     |      23
0     |    1     |     0     |      25
1     |    0     |     0     |      24 
0     |    0     |     1     |      30

I succeed to get the desire transformation via using OneHotEncoder as

#Encoding the categorical data
from sklearn.preprocessing import LabelEncoder

labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])

#we are dummy encoding as the machine learning algorithms will be
#confused with the values like Spain > Germany > France
from sklearn.preprocessing import OneHotEncoder

onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()

Now I'm getting the depreciation message to use categories='auto'. If I do so the transformation is being done for the all independent columns like country, age, salary etc.

How to achieve the transformation on the dataset 0th column only?

标签： python machine-learning one-hot-encoding

7条回答

地球回转人心会变

2楼-- · 2020-05-19 08:05

There is actually 2 warnings :

FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values. If you want the future behaviour and silence this warning, you can specify "categories='auto'". In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.

and the second :

The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
"use the ColumnTransformer instead.", DeprecationWarning)

In the future, you should not define the columns in the OneHotEncoder directly, unless you want to use "categories='auto'". The first message also tells you to use OneHotEncoder directly, without the LabelEncoder first. Finally, the second message tells you to use ColumnTransformer, which is like a Pipe for columns transformations.

Here is the equivalent code for your case :

from sklearn.compose import ColumnTransformer 
ct = ColumnTransformer([("Name_Of_Your_Step", OneHotEncoder(),[0])], remainder="passthrough")) # The last arg ([0]) is the list of columns you want to transform in this step
ct.fit_transform(X)

For the above example;

Encoding Categorical data (Basically Changing Text to Numerical data i.e, Country Name) from sklearn.preprocessing import LabelEncoder, OneHotEncoder from sklearn.compose import ColumnTransformer Encode Country Column labelencoder_X = LabelEncoder() X[:,0] = labelencoder_X.fit_transform(X[:,0]) ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough') X = ct.fit_transform(X)

0人赞添加讨论(0) 举报

SAY GOODBYE

3楼-- · 2020-05-19 08:07

Dont use the labelencoder and directly use OneHotEncoder.

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
A = make_column_transformer(
    (OneHotEncoder(categories='auto'), [0]), 
    remainder="passthrough")

x=A.fit_transform(x)

0人赞添加讨论(0) 举报

我想做一个坏孩纸

4楼-- · 2020-05-19 08:15

transformer = ColumnTransformer(
    transformers=[
        ("Country",        # Just a name
         OneHotEncoder(), # The transformer class
         [0]            # The column(s) to be applied on.
         )
    ], remainder='passthrough'
)
X = transformer.fit_transform(X)

Reminder will keep previous data while [0]th column will replace will be encoded

0人赞添加讨论(0) 举报

疯言疯语

5楼-- · 2020-05-19 08:20

As of version 0.22, you can write the same code as below:

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)

As you can see, you don't need to use LabelEncoder anymore.

0人赞添加讨论(0) 举报

太酷不给撩

6楼-- · 2020-05-19 08:23

There is a way that you can do one hot encoding with pandas. Python:

import pandas as pd
ohe=pd.get_dummies(dataframe_name['column_name'])

Give names to the newly formed columns add it to your dataframe. Check the pandas documentation here.

0人赞添加讨论(0) 举报

祖国的老花朵

7楼-- · 2020-05-19 08:23

Use the following code :-

from sklearn.preprocessing import OneHotEncoder

from sklearn.compose import ColumnTransformer

columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')

X = np.array(columnTransformer.fit_transform(X), dtype = np.str)

print(X)

0人赞添加讨论(0) 举报

1 2 下一页

OneHotEncoder categorical_features depreciated, ho

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间