I have a pandas series containing zeros and ones:
df1 = pd.Series([ 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])
df1
Out[3]:
0 0
1 0
2 0
3 0
4 0
5 1
6 1
7 1
8 0
9 0
10 0
I would like to create a dataframe df2 that contains the start and the end of intervals with the same value, together with the value associated... df2 in this case should be...
df2
Out[5]:
Start End Value
0 0 4 0
1 5 7 1
2 8 10 0
My attempt was:
from operator import itemgetter
from itertools import groupby
a=[next(group) for key, group in groupby(enumerate(df1), key=itemgetter(1))]
df2 = pd.DataFrame(a,columns=['Start','Value'])
but I don't know how to get the 'End' indeces
The thing you are looking for is get first and last values in a groupby
You can
groupby
bySeries
which is create bycumsum
of shiftedSeries
df1
byshift
.Then
apply
custum function and last reshape byunstack
.Another solution with aggregation by
agg
withfirst
andlast
, but there is necessary more code for handling output by desired output.You can groupby using shift and cumsum and find first and last valid index
You get
You could use the
pd.Series.diff()
method so as to identify the starting indexes:Then compute end indexes from this:
And finally gather the associated values :
ouput