I'm quite new to all the packages meant for calculating rolling averages in R and I hope you can show me in the right direction.
I have the following data as an example:
ms <- c(300, 300, 300, 301, 303, 305, 305, 306, 308, 310, 310, 311, 312,
314, 315, 315, 316, 316, 316, 317, 318, 320, 320, 321, 322, 324,
328, 329, 330, 330, 330, 332, 332, 334, 334, 335, 335, 336, 336,
337, 338, 338, 338, 340, 340, 341, 342, 342, 342, 342)
correct <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 1, 0, 0, 1, 1, 0, 0)
df <- data.frame(ms, correct)
ms
are time points in milliseconds and correct
is whether a specific action is performed correctly
(1 = correct, 0 = not correct).
My goal now is that I'd like to calculate the percentage correct (or average) over windows of a set number of milliseconds. As you can see, certain time points are missing and certain time points occur multiple times. I, therefore, do not want to do a filter based on row number. I've looked into some packages such as "tidyquant" but it seems to me that these kind of packages need a time/date variable instead of a numerical variable to determine the window over which values are averaged. Is there a way to specify the window on the numerical value of df$ms
?
For the sake of completeness, here is an answer which uses data.table to aggregate in a non-equi join.
The OP has clarified in comments, that he is looking for a sliding window of 5 ms, i.e., windows that go 300-304, 301-305, 302-306 etc.
As there is no data point with 302 ms in OP's data set, the missing values need to be filled up.
If the OP would be interested only in windows where the starting point exist in the dataset the code can be simplified:
In both cases, a data.table containing the intervals
[start, end]
is created on the fly and right joined todf
. During the non-equi join, the intermediate result is immediately grouped by the join parameters (by = .EACHI
) and aggregated. Note that closed intervals are used to be in line with OP's expectations.Try out:
This could be done with
base R
:You can apply it as follows (the default is set to 5 ms, you can change it with changing the
window_var
parameter):In your case, you would get (first 10 rows shown only):
It behaves like a rolling mean, however it does not rely on rows. Instead, it takes the window based on values in a column.
For instance, at rows 6 and 7, it takes the value of current row (305 ms), and calculates the ratio on all the values in dataframe that are 305 and - 5, i.e. between 305 and 300, yielding 0.29.
You can of course always modify the function yourself, e.g. if you'd like window 5 to actually mean 301 - 305 and not 300 - 305, you can set + 1 after
x - window_var
, etc.You can try 'cut'. For example, if you want to divide ms such that you have 5 groups overall then you can do: