This is an example of my data frame:
df_lst = [
{"wordcount": 100, "Stats": 198765, "id": 34},
{"wordcount": 99, "Stats": 98765, "id": 35},
{"wordcount": 200, "Stats": 18765, "id": 36},
{"wordcount": 250, "Stats": 788765, "id": 37},
{"wordcount": 345, "Stats": 12765, "id": 38},
{"wordcount": 456, "Stats": 238765, "id": 39},
{"wordcount": 478, "Stats": 1934, "id": 40},
{"wordcount": 890, "Stats": 19845, "id": 41},
{"wordcount": 812, "Stats": 1987, "id": 42}]
df = pd.DataFrame(df_lst)
df.set_index('id', inplace=True)
df.head()
DF:
Stats wordcount
id
34 198765 100
35 98765 99
36 18765 200
37 788765 250
38 12765 345
I'd like to calculate the average Stats
for each range of wordcount
with a step of 100 so the new data frame looks something like this:
Average wordcount
194567 100
23456 200
2378 300
...
Where 100 means from 0-100 etc. I started writing multiple conditions but have a feeling there is a more efficient way of achieving this. Will appreciate your help.