In Python Pandas, I have a DataFrame. I group this DataFrame by a column and want to assign the last value of a column to all rows of another column.
I know that I am able to select the last row of the group by this command:
import pandas as pd
df = pd.DataFrame({'a': (1,1,2,3,3), 'b':(20,21,30,40,41)})
print(df)
print("-")
result = df.groupby('a').nth(-1)
print(result)
Result:
a b
0 1 20
1 1 21
2 2 30
3 3 40
4 3 41
-
b
a
1 21
2 30
3 41
How would it be possible to assign the result of this operation back to the original dataframe so that I have something like:
a b b_new
0 1 20 21
1 1 21 21
2 2 30 30
3 3 40 41
4 3 41 41
Two possibilities, with
groupby
+nth
+map
orreplace
Or,
You can also replace
nth(-1)
withlast()
(in fact, doing so happens to make this a little faster), butnth
gives you more flexibility over what item to pick from each group inb
.I think this should be fast
Use
transform
withlast
:Alternative:
Solution with
nth
andjoin
:Timings:
Caveat
The results do not address performance given the number of groups, which will affect timings a lot for some of these solutions.