I want to count per country
the number of times the status
is open
and the number of times the status
is closed
. Then calculate the closerate
per country
.
Data:
customer <- c(1,2,3,4,5,6,7,8,9)
country <- c('BE', 'NL', 'NL','NL','BE','NL','BE','BE','NL')
closeday <- c('2017-08-23', '2017-08-05', '2017-08-22', '2017-08-26',
'2017-08-25', '2017-08-13', '2017-08-30', '2017-08-05', '2017-08-23')
closeday <- as.Date(closeday)
df <- data.frame(customer,country,closeday)
Adding status
:
df$status <- ifelse(df$closeday < '2017-08-20', 'open', 'closed')
customer country closeday status
1 1 BE 2017-08-23 closed
2 2 NL 2017-08-05 open
3 3 NL 2017-08-22 closed
4 4 NL 2017-08-26 closed
5 5 BE 2017-08-25 closed
6 6 NL 2017-08-13 open
7 7 BE 2017-08-30 closed
8 8 BE 2017-08-05 open
9 9 NL 2017-08-23 closed
Calculation closerate
closerate <- length(which(df$status == 'closed')) /
(length(which(df$status == 'closed')) + length(which(df$status == 'open')))
[1] 0.6666667
Obviously, this is the closerate
for the total. The challenge is to get the closerate
per country
. I tried adding the closerate
calculation to df
by:
df$closerate <- length(which(df$status == 'closed')) /
(length(which(df$status == 'closed')) + length(which(df$status == 'open')))
But it gives all lines a closerate
of 0.66 because I'm not grouping. I believe I should not use the length function because counting can be done by grouping. I read some information about using dplyr
to count logical outputs per group but this didn't work out.
This is the desired output:
You can use
tapply
:There was a solution using
table
in the comments which appears to have been deleted. Anyway, you could also usetable
A
data.table
method would be.which returns
Here's a quick solution with
tidyverse
:Returning:
Here I am utilizing the coercion of logicals into integer when the vector of TRUE/FALSE is put into the
mean()
function.Alternatively, with
data.table
:Here is a
dplyr
solution.Returns...