I am trying to downsample a pandas dataframe in order to reduce granularity. In example, I want to reduce this dataframe:
1 2 3 4
2 4 3 3
2 2 1 3
3 1 3 2
to this (downsampling to obtain a 2x2 dataframe using mean):
2.25 3.25
2 2.25
Is there a builtin way or efficient way to do it or I have to write it on my own?
Thanks
One option is to use groupby twice. Once for the index:
and once for the columns:
Note: A solution which only calculated the mean once might be preferable... one option is to stack, groupby, mean, and unstack, but atm this is a little fiddly.
This seems significantly faster than Vicktor's solution:
In fact, Viktor's solution crashes my (underpowered) laptop for larger DataFrames:
As Viktor points out, this doesn't work with non-integer index, if this was wanted, you could just store them as temp variables and feed them back in after:
You can use the
rolling_mean
function applied twice, first on the columns and then on the rows, and then slice the results:Which gives the same result you want, except the index will be different (but you can fix this using
.reset_index(drop=True)
):Timing info:
So it's around 5x slower than the groupby not 800x :)