I have a sample data df set like :
issue_1 issue_2 issue_3 check cat_1 cat_2 cat_3
a - - 0 1 0 0
- b - 1 0 1 0
- - c 1 0 0 1
p - - 0 1 0 0
- - q 1 0 0 1
- r - 0 0 1 0
a - - 1 1 0 0
a b - 1 1 1 0
to explain, it has multiple occurances of issue_1, issue_2 and issue_3 and for each row value of check is either 0 or 1
I need to calculate total occurances of each values for each issue and total count of 1's for each value of each issues. So for given sample for issue_1 we have 3 occurances of a and 2 cases where a = 1 and one case of p and 0 count of 1's for p. Similarly for other two issues.
I used nested for loop but instead of counting at grouped level it is giving total count of rows. Can someone suggest some better way?
Sample code:
abc <- c('issue_1', 'issue_2', 'issue_3')
qwe <- c('cat_1', 'cat_2', 'cat_3')
for(i in abc){
for(j in qwe){
temp <- df[, c(i, j, 'check')]
temp <- subset(temp, temp[[j]] != 0)
temp <- temp %>%
group_by(temp[[i]]) %>%
mutate(total_issue = length(temp[[i]]) %>%
mutate(check_again = length(check[check == 1])) %>%
mutate(percentage = (check_again/total_issue)*100)
temp <- subset(temp, !(duplicated(temp[[i]])))
temp <- temp[, c(i, 'total_issue', 'check_again', 'percentage')]
assign(paste(i, 'stats', sep = '_'), temp)
write.csv(temp, paste('path', i, j, '_stats', '.csv'))
}
}
So for this one, for issue_1 and cat_1 it should give:
issue_1 total_issue check_again percentage
a 3 2 2/3*100
p 1 0 0