I am trying to find the mean length of a variable over a dataframe using dplyr:
x <- data %>% group_by(Date, `% Bucket`) %>% summarise(count = n())
Date % Bucket count
(date) (fctr) (int)
1 2015-01-05 <=1 1566
2 2015-01-05 (1-25] 421
3 2015-01-05 (25-50] 461
4 2015-01-05 (50-75] 485
5 2015-01-05 (75-100] 662
6 2015-01-05 (100-150] 1693
7 2015-01-05 >150 12359
8 2015-01-13 <=1 1608
9 2015-01-13 (1-25] 441
10 2015-01-13 (25-50] 425
How to aggregate to find average across each % Bucket
over the year with dplyr
?
in base:
x <- as.data.frame(x)
aggregate(count ~ `% Bucket`, data = x, FUN=mean)
% Bucket count
1 <=1 2609.5294
2 (1-25] 449.0000
3 (25-50] 528.7059
4 (50-75] 593.2157
5 (75-100] 763.0000
6 (100-150] 1758.6667
7 >150 12457.9216
Aggregate function will take the count found by dplyr across each bucket above and sum them, dividing by the number of rows that contain that % Bucket
variable and give the answer above. How can I accomplish this with dplyr though? This is not about completing the problem but understanding how the dplyr package would be used in such a scenario.
Another example of this type of thing would be summarise
the n()
of each group_by
variable and also listing the minimum length "count" of that variable across the 52 weeks.
I am struggling because dplyr seems to be built to find a mean of a value in a column, but here I am counting the number of row occurrences given a variable in a column and trying to find the mean, min, max, etc. of it.
We can use
dplyr
methods