Imagine a pandas
dataframe that are given by
df = pd.DataFrame({
'id': [1, 1, 1, 2, 2],
'location': [1, 2, 3, 1, 2],
'date': [pd.to_datetime('01-01-{}'.format(year)) for year in [2015, 2016, 2015, 2017, 2018]]
}).set_index('id')
which looks like this
location date
id
1 1 2015-01-01
1 2 2016-01-01
1 3 2015-01-01
2 1 2017-01-01
2 2 2018-01-01
Now I want to create a column for each year represented in the date
column that counts occurences by id
. Hence the resulting data frame should be like this
location date 2015 2016 2017 2018
id
1 1 2015-01-01 2 1 0 0
1 2 2016-01-01 2 1 0 0
1 3 2015-01-01 2 1 0 0
2 1 2017-01-01 0 0 1 1
2 2 2018-01-01 0 0 1 1
Now I imagine using pd.groupby.transform
but I can't figure out the best solution.
My own solution was
df['year'] = df['date'].map(lambda x: x.year)
df = pd.merge(
df,
pd.pivot_table(df, 'date', 'id', 'year', 'count').fillna(0).astype(int),
left_index=True, right_index=True).drop('year', axis=1)
Create helper
DataFrame
bygroupby
withsize
,unstack
andyear
andjoin
to originaldf
:Detail:
Another solution with
crosstab
:get_dummies
factorize