Here is an example of my dataset. I want to calculate bin average based on time (i.e., ts) every 10 seconds. Could you please provide some hints so that I can carry on?
In my case, I want to average time (ts) and Var in every 10 seconds. For example, I will get an averaged value of Var and ts from 0 to 10 seconds; I will get another averaged value of Var and ts from 11 to 20 seconds, etc.
df = data.frame(ts = seq(1,100,by=0.5), Var = runif(199,1, 10))
Any functions or libraries in R can I use for this task?
There are many ways to calculate a binned average: with base
aggregate
,by
, with the packagesdplyr
,data.table
, probably withzoo
and surely other timeseries packages...That's the dplyr approach, see how it and data.table let you name the intermediate variables, which keeps code clean and legible.
In general, I agree with @smci, the
dplyr
anddata.table
approach is the best here. Let me elaborate a bit further.I would not go to the traditional time series solutions like
ts
,zoo
orxts
here. Their methods are more suitable to handle regular frequencies and frequency like monthly or quarterly data. Apart fromts
they can handle irregular frequencies and also high frequency data, but many methods such as the print methods don't work well or least do not give you an advantage overdata.table
ordata.frame
.As long as you're just aggregating and grouping both
data.table
anddplyr
are also likely faster in terms of performance. Guessdata.table
has the edge overdplyr
in terms of speed, but you would have benchmark / profile that, e.g. usingmicrobenchmark
. So if you're not working with a classic R time series format anyway, there's no reason to go to these for aggregating.Assuming
df
in the question, convert to a zoo object and then aggregate.The second argument of
aggregate.zoo
is a vector the same length as the time vector giving the new times that each original time is to be mapped to. The third argument is applied to all time series values whose times have been mapped to the same value. This mapping could be done in various ways but here we have chosen to map times (0, 10] to 10, (10, 20] to 20, etc. by using10 * ceiling(time(z) / 10)
.In light of some of the other comments in the answers let me point out that in contrast to using a data frame there is significant simplification here, firstly because the data has been reduced to one dimension (vs. 2 in a data.frame), secondly because it is more conducive to the whole object approach whereas with data frames one needs to continually pick apart the object and work on those parts and thirdly because one now has all the facilities of zoo to manipulate the time series such as numerous NA removal schemes, rolling functions, overloaded arithmetic operators, n-way merges, simple access to classic, lattice and ggplot2 graphics, design which emphasizes consistency with base R making it easy to learn and extensive documentation including 5 vignettes plus help files with numerous examples and likely very few bugs given the 14 years of development and widespread use.
giving:
(Note that the data in the question is not reproducible because it used random numbers without
set.seed
so if you try to repeat the above you won't get an identical answer.)Now we could plot it, say, using any of these: