I am trying to create a column that does a cumulative sum using 2 columns , please see example of what I am trying to do :@Faith Akici
index lodgement_year words sum cum_sum
0 2000 the 14 14
1 2000 australia 10 10
2 2000 word 12 12
3 2000 brand 8 8
4 2000 fresh 5 5
5 2001 the 8 22
6 2001 australia 3 13
7 2001 banana 1 1
8 2001 brand 7 15
9 2001 fresh 1 6
I have used the code below , however my computer keep crashing , I am unsure if is the code or the computer. Any help will be greatly appreciated:
df_2['cumsum']= df_2.groupby('lodgement_year')['words'].transform(pd.Series.cumsum)
Update ; I have also used the code below , it worked and said exit code 0 . However with some warnings.
df_2['cum_sum'] =df_2.groupby(['words'])['count'].cumsum()
You are almost there, Ian!
cumsum()
method calculates the cumulative sum of a Pandas column. You are looking for that applied to the groupedwords
. Therefore:Please comment if this fails on your bigger data set, and we'll work on a possibly more efficient version of this.
(And by the way, I didn't get a notification because you made a typo in my name: Fatih, not Faith :))
If we only need to consider the column 'words', we might need to loop through unique values of the words
above will result to: