Ok, second R question in quick succession.
My data:
Timestamp St_01 St_02 ...
1 2008-02-08 00:00:00 26.020 25.840 ...
2 2008-02-08 00:10:00 25.985 25.790 ...
3 2008-02-08 00:20:00 25.930 25.765 ...
4 2008-02-08 00:30:00 25.925 25.730 ...
5 2008-02-08 00:40:00 25.975 25.695 ...
...
Basically normally I would use a combination of ddply
and summarize
to calculate ensembles (e.g. mean for every hour across the whole year).
In the case above, I would create a category, e.g. hour (e.g. strptime(data$Timestamp,"%H") -> data$hour
and then use that category in ddply
, like ddply(data,"hour", summarize, St_01=mean(St_01), St_02=mean(St_02)...)
to average by category across each of the columns.
but here is where it gets sticky. I have more than 40 columns to deal with and I'm not prepared to type them all one by one as parameters to the summarize
function. I used to write a loop in shell to generate this code but that's not how programmers solve problems is it?
So pray tell, does anyone have a better way of achieving the same result but with less keystrokes?
You can use
numcolwise()
to run a summary over all numeric columns.Here is an example using
iris
:Similarly, there is
catcolwise()
to summarise over all categorical columns.See
?numcolwise
for more help and examples.EDIT
An alternative approach is to use
reshape2
(proposed by @gsk3). This has more keystrokes in this example, but gives you enormous flexibility:library(reshape2)
You can even simplify the second approach proposed by Andrie by omitting the ddply call completely. Just specify
mean
as the aggregation function in the dcast call:The same result can also be calculated very fast using the
data.table
package. The.SD
variable in the j expression is a special data.table variable containing the subset of data for each group, excluding all columns used inby
.Yet another option would be the new version 0.2 of Hadley's
dplyr
package