This is similar to Attach a calculated column to an existing dataframe, however, that solution doesn't work when grouping by more than one column in pandas v0.14.
For example:
$ df = pd.DataFrame([
[1, 1, 1],
[1, 2, 1],
[1, 2, 2],
[1, 3, 1],
[2, 1, 1]],
columns=['id', 'country', 'source'])
The following calculation works:
$ df.groupby(['id','country'])['source'].apply(lambda x: x.unique().tolist())
0 [1]
1 [1, 2]
2 [1, 2]
3 [1]
4 [1]
Name: source, dtype: object
But assigning the output to a new column result in an error:
df['source_list'] = df.groupby(['id','country'])['source'].apply(
lambda x: x.unique().tolist())
TypeError: incompatible index of inserted column with frame index
An alternative method that avoids the post-facto merge is providing the index in the function applied to each group, e.g.
This can be achieved without the merge by reassigning the result of the
groupby.apply
to the original dataframe.with your
_add_sourcelist_col
function being,Note that additional columns can also be added in your defined function. Just simply add them to each group dataframe, and be sure to return the group at the end of your function declaration.
Edit: I'll leave the info above as it might still be useful, but I misinterpreted part of the original quesiton. What the OP was trying to accomplish can be done using,
Merge grouped result with the initial DataFrame: