I have a data frame with datetimes and integers
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['dt'] = pd.date_range("2017-01-01 12:00", "2017-01-01 12:30", freq="1min")
df['val'] = np.random.choice(xrange(1, 100), df.shape[0])
Gives me
dt val
0 2017-01-01 12:00:00 33
1 2017-01-01 12:01:00 42
2 2017-01-01 12:02:00 44
3 2017-01-01 12:03:00 6
4 2017-01-01 12:04:00 70
5 2017-01-01 12:05:00 94*
6 2017-01-01 12:06:00 42*
7 2017-01-01 12:07:00 97*
8 2017-01-01 12:08:00 12
9 2017-01-01 12:09:00 11
10 2017-01-01 12:10:00 66
11 2017-01-01 12:11:00 71
12 2017-01-01 12:12:00 25
13 2017-01-01 12:13:00 23
14 2017-01-01 12:14:00 39
15 2017-01-01 12:15:00 25
How can I find which N
-minute group of consecutive dt
gives me the maximum sum of val
?
In this case, if N=3
, then the result should be:
dt val
5 2017-01-01 12:05:00 94
6 2017-01-01 12:06:00 42
7 2017-01-01 12:07:00 97
(marked with stars above)
For simple single values, you can use something like:
Not sure how to generalize this... with most things pandas, there is probably an easier way, but this does work.
Edit: After a little more work, it looks like rolling is what you want:
This is a slightly different, in that the index you get here is the end, so after doing the above you want to do
I think that could be generalized.
You could use
np.convolve
to get the correct starting index and go from there.Demo
This works be effectively "sliding" the kernel (array of ones) across our input and multiply-accumulating the elements in our window of size
N
together.You could use
rolling/sum
andnp.nanargmax
to find the index associated with the first occurrence of the maximum value:prints
iloc
uses ordinal indexing.loc
uses label-based indexing. Provided that bothi-N
andi
are valid indices,df.iloc[i-N : i]
will grab a window (sub-DataFrame) of lengthN
. In contrast,df.loc[i-N, i]
will only grab a window of lengthN
if the index uses consecutive integers. The example above shows a DataFrame wheredf.loc
would not work sincedf.index
has non-consecutive integer values.