My question is similar to this one, except a bit different. In the initial question, I was trying to count (for each row) how many columns satisfied a condition. I would like to do something similar, except that the condition involves several columns with an OR condition, and my real data has many columns, so ideally, I'd like to reference the columns using a regular expression.
I have the following data:
colnames <- c(paste("col",rep(LETTERS[1:2],each=4),rep(1:4,2),sep=""),c("meh","muh"))
df <- as.data.frame(matrix(sample(c("Yes","No"),200,replace=TRUE),ncol=10))
names(df) <- colnames
df
colA1 colA2 colA3 colA4 colB1 colB2 colB3 colB4 meh muh
1 No Yes No No No Yes Yes No Yes Yes
2 No Yes Yes Yes Yes No Yes No No No
3 No No No Yes No No No No Yes No
4 Yes No Yes Yes Yes Yes Yes Yes No Yes
5 Yes No Yes No No No No Yes No Yes
6 Yes No No No Yes Yes No No No No
7 Yes No No No Yes Yes Yes No Yes No
8 Yes No Yes No Yes Yes No Yes Yes No
9 No Yes No No No Yes Yes No No No
10 Yes Yes No No Yes No Yes No Yes No
11 No Yes No No Yes No Yes Yes No No
12 No Yes Yes Yes No No Yes No No No
13 No No Yes Yes No Yes Yes Yes Yes No
14 Yes Yes No No No No Yes No No Yes
15 Yes No Yes Yes No Yes No Yes No No
16 No Yes Yes No No No Yes No No No
17 Yes No No No No Yes Yes Yes No Yes
18 Yes No Yes Yes No No No No No Yes
19 No No No No No Yes No No No Yes
20 No Yes No No Yes Yes Yes No No No
I would like to create a new column Nb
that records, for each line: the number of times at least one of colA2, colA3,colA4 is =="Yes" plus the number of times at least one of colB2, colB3,colB4 is =="Yes".
If there was not this "OR" condition implied when look at a group of columns [colA2, colA3,colA4], and I was adding the number of columns satisfying the condition, I could have used something like:
df$Nb <- rowSums(df[, grep("^col[A-B][2-4]", names(df))] == "Yes")
I would like to use regex if possible to reference the columns, as in my real data letters and numbers go further than B and 5 respectively.
Thank you!