I have a simple dataframe like this for example:
df = pd.DataFrame({'Name': ['John Doe', 'Jane Smith', 'John Doe', 'Jane Smith','Jack Dawson','John Doe']})
df:
Name
0 John Doe
1 Jane Smith
2 John Doe
3 Jane Smith
4 Jack Dawson
5 John Doe
I want to add a column ['foreign_key'] that assigns a unique ID to each unique name (but rows with the same name should have the same 'foreign_key'. So the final output looks like:
df:
Name Foreign_Key
0 John Doe foreignkey1
1 Jane Smith foreignkey2
2 John Doe foreignkey1
3 Jane Smith foreignkey2
4 Jack Dawson foreignkey3
5 John Doe foreignkey1
I'm trying to use groupby with a custom function that is applied. So my first step is:
name_groupby = df.groupby('Name')
So that's the splitting, and next comes the apply and combine. There doesn't appear to be anything in the docs like this example and I'm unsure where to go from here.
The custom function I started to apply looks like this:
def make_foreign_key(groupby_df):
return groupby_df['Foreign_Key'] = 'foreign_key' + num
Any help is greatly appreciated!