可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a simple dataframe like this for example:

df = pd.DataFrame({'Name': ['John Doe', 'Jane Smith', 'John Doe', 'Jane Smith','Jack Dawson','John Doe']})
df:
        Name
    0   John Doe
    1   Jane Smith
    2   John Doe
    3   Jane Smith
    4   Jack Dawson
    5   John Doe

I want to add a column ['foreign_key'] that assigns a unique ID to each unique name (but rows with the same name should have the same 'foreign_key'. So the final output looks like:

df:
            Name        Foreign_Key
        0   John Doe    foreignkey1
        1   Jane Smith  foreignkey2
        2   John Doe    foreignkey1
        3   Jane Smith  foreignkey2
        4   Jack Dawson foreignkey3
        5   John Doe    foreignkey1

I'm trying to use groupby with a custom function that is applied. So my first step is:

name_groupby = df.groupby('Name')

So that's the splitting, and next comes the apply and combine. There doesn't appear to be anything in the docs like this example and I'm unsure where to go from here.

The custom function I started to apply looks like this:

def make_foreign_key(groupby_df):
    return groupby_df['Foreign_Key'] = 'foreign_key' + num

Any help is greatly appreciated!

回答1:

You can do:

pd.merge(
    df,
    pd.DataFrame(df.Name.unique(), columns=['Name']).reset_index().rename(columns={'index': 'Foreign_Key'}),
    on='Name'
)

         Name  Foreign_Key
0    John Doe            0
1    John Doe            0
2  Jane Smith            1
3  Jane Smith            1

回答2:

You can make Name into a Categorical with much the same effect:

In [21]: df["Name"].astype('category')
Out[21]:
0       John Doe
1     Jane Smith
2       John Doe
3     Jane Smith
4    Jack Dawson
5       John Doe
Name: Name, dtype: category
Categories (3, object): [Jack Dawson, Jane Smith, John Doe]

See the categorical section of the docs.

That may suffice, or you can pull out the codes as foreign key.

In [22]: df["Name"] = df["Name"].astype('category')

In [23]: df["Name"].cat.codes
Out[23]:
0    2
1    1
2    2
3    1
4    0
5    2
dtype: int8

In [24]: df["Foreign_Key"] = c.cat.codes

In [25]: df
Out[25]:
          Name  Foreign_Key
0     John Doe            2
1   Jane Smith            1
2     John Doe            2
3   Jane Smith            1
4  Jack Dawson            0
5     John Doe            2

回答3:

I faced the same problem short time ago and my solution looked like the following:

import pandas as pd
import numpy as np
values = df['Name'].unique()
values = pd.Series(np.arange(len(values)), values)
df['new_column'] = df['Name'].apply(values.get)

The output:

          Name  new_column
0     John Doe           0
1   Jane Smith           1
2     John Doe           0
3   Jane Smith           1
4  Jack Dawson           2
5     John Doe           0