convert dataframe columns value into digital numbe

2019-08-06 03:51发布

I have following data in my column of data frame. How can I convert each domain name by digital number? I try to use replace in a for loop. However, since I have more than 1200 unqie domain name. I do not want to It seems like it is not a idea way to do it

for i, v in np.ndenumerate(np.unique(df['domain'])):
    df['domain'] = df['domain'].replace(to_replace=[v], value=i[0]+1, inplace=True)

but it does not work

data frame:
    type  domain
0    1     yahoo.com
1    1     google.com
2    0     google.com
3    0     aa.com
4    0     google.com
5    0     aa.com
6    1     abc.com
7    1     msn.com
8    1     abc.com
9    1     abc.com
....

I want to convert to

    type  domain
0    1     1
1    1     2
2    0     2
3    0     3
4    0     2
5    0     3
6    1     4
7    1     5
8    1     4
9    1     4
....

2条回答
做个烂人
2楼-- · 2019-08-06 04:06

If it does really matter for the digital number assignment, you can try this

import pandas as pd 

df.domain.astype('category').cat.codes

Out[154]: 
0    4
1    2
2    2
3    0
4    2
5    0
6    1
7    3
8    1
9    1
dtype: int8

If that is matter, you can try

maplist=df[['domain']].drop_duplicates(keep='first').reset_index(drop=True).reset_index().set_index('domain')
maplist['index']=maplist['index']+1
df.domain=df.domain.map(maplist['index'])
   Out[177]: 
   type  domain
0     1       1
1     1       2
2     0       2
3     0       3
4     0       2
5     0       3
6     1       4
7     1       5
8     1       4
9     1       4
查看更多
做自己的国王
3楼-- · 2019-08-06 04:21

Let's use pd.factorize:

df.assign(domain=pd.factorize(df.domain)[0]+1)

Output:

   type  domain
0     1       1
1     1       2
2     0       2
3     0       3
4     0       2
5     0       3
6     1       4
7     1       5
8     1       4
9     1       4
查看更多
登录 后发表回答