I'm looking for a pandas equivalent of the resample
method for a dataframe whose isn't a DatetimeIndex
but an array of integers, or maybe even floats.
I know that for some cases (this one, for example) the resample method can be substituted easily by a reindex and interpolation, but for some cases (I think) it can't.
For example, if I have
df = pd.DataFrame(np.random.randn(10,2))
withdates = df.set_index(pd.date_range('2012-01-01', periods=10))
withdates.resample('5D', np.std)
this gives me
0 1
2012-01-01 1.184582 0.492113
2012-01-06 0.533134 0.982562
but I can't produce the same result with df
and resample. So I'm looking for something that would work as
df.resample(5, np.std)
and that would give me
0 1
0 1.184582 0.492113
5 0.533134 0.982562
Does such a method exist? The only way I was able to create this method was by manually separating df
into smaller dataframes, applying np.std
and then concatenating everything back, which I find pretty slow and not smart at all.
Cheers
@piSquared solution is really nice, but I don't like picking index per hand at reindexing.
This should works too for each kind of downsampling (float index too) and automatically pick the mean of the index in each range:
Now you can pick the function you want to calculate in each sub group at your will:
EDIT : There were some errors in s indexing, now it is correct & working.
Alternative, this is one thing that can be done
Setup
You need to create the labels to group by yourself. I'd use:
To get you a series of values like
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, ...]
Then use this in agroupby
You'll also need to specify the index for the new dataframe. I'd use:
To get a the current index starting at the 5th position (hence the
4
) and every 5th position after that. It will look like[4, 9, 14, 19]
. I could've done this asdf.index[::5]
to get the starting positions but I went with ending positions.Solution
Looks like:
Other considerations
This is for the equivalent of down sampling. We haven't addressed up sampling.
To go back from what we've produced to a dataframe index by something more frequent, we can use
reindex
like so:Looks like:
We could also use other things to
reindex
by likerange(0, 20, 2)
to up sample to even integer indices.