I've got a pandas DataFrame with a boolean column sorted by another column and need to calculate reverse cumulative sum of the boolean column, that is, amount of true values from current row to bottom.
Example
In [13]: df = pd.DataFrame({'A': [True] * 3 + [False] * 5, 'B': np.random.rand(8) })
In [15]: df = df.sort_values('B')
In [16]: df
Out[16]:
A B
6 False 0.037710
2 True 0.315414
4 False 0.332480
7 False 0.445505
3 False 0.580156
1 True 0.741551
5 False 0.796944
0 True 0.817563
I need something that will give me a new column with values
3
3
2
2
2
2
1
1
That is, for each row it should contain amount of True values on this row and rows below.
I've tried various methods using .iloc[::-1]
but result is not that is desired.
Think, I'm missing an obvious thing. I've starting using Pandas only yesterday.
This works but is slow... like @unutbu answer. True resolves to 1. Fails on False, or any other value though.
Similar to unutbus first suggestion, but without the deprecated ix:
Reverse column A, take the cumsum, then reverse again:
yields
Alternatively, you could count the number of
True
s in columnA
and subtract the (shifted) cumsum:But this is significantly slower. Using IPython to perform the benchmark: