I want to create a new column in my data frame that is either TRUE or FALSE depending on whether a term occurs in two specified columns.
This is some example data:
AB <- c('CHINAS PARTY CONGRESS','JAPAN-US RELATIONS','JAPAN TRIES TO')
TI <- c('AMERICAN FOREIGN POLICY', 'CHINESE ATTEMPTS TO', 'BRITAIN HAS TEA')
AU <- c('AUTHOR 1', 'AUTHOR 2','AUTHOR 3')
M <- data.frame(AB,TI,AU)
I can do it for one column, or the other, but I cannot figure out how to do it for both. In other words, I don't know how to combine these two lines that would not mutually overwrite each other.
M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=M$AB)
M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=M$TI)
It is important that I specify the columns, I cannot choose the whole data.frame.I have looked for other similar questions, but none seemed to apply to my case and I haven't been able to adapt any existing examples. This is what would make sense to me:
M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=(M$AB|M$TI)
Using:
M$China <- !!rowSums(sapply(M[1:2], grepl, pattern = "CHINA|CHINESE|SINO"))
gives:
> M
AB TI AU China
1 CHINAS PARTY CONGRESS AMERICAN FOREIGN POLICY AUTHOR 1 TRUE
2 JAPAN-US RELATIONS CHINESE ATTEMPTS TO AUTHOR 2 TRUE
3 JAPAN TRIES TO BRITAIN HAS TEA AUTHOR 3 FALSE
What this does:
sapply(M[1:2], grepl, pattern = "CHINA|CHINESE|SINO")
loops over the two AB
and TI
columns and looks whether one of the parts of the pattern ("CHINA|CHINESE|SINO"
) is present.
The sapply
-call returns a matrix of TRUE
/FALSE
values:
AB TI
[1,] TRUE FALSE
[2,] FALSE TRUE
[3,] FALSE FALSE
With rowSums
you check how many TRUE
-values each row has.
- By adding
!!
in front ofrowSums
you convert all values from the rowSums
-call higher than zero to TRUE
and all eros to FALSE
.
If we need to collapse to a single vector, use the Map
to loop through the columns, apply the pattern
to get a list
of logical
vector, then Reduce
it to a logical
vector using |
M$China <- Reduce(`|`, Map(grepl, "CHINA|CHINESE|SINO", M))
M
# AB TI AU China
#1 CHINAS PARTY CONGRESS AMERICAN FOREIGN POLICY AUTHOR 1 TRUE
#2 JAPAN-US RELATIONS CHINESE ATTEMPTS TO AUTHOR 2 TRUE
#3 JAPAN TRIES TO BRITAIN HAS TEA AUTHOR 3 FALSE
Or using the same methodology in tidyverse
library(tidyverse)
M %>%
mutate_all(funs(str_detect(., "CHINA|CHINESE|SINO"))) %>%
reduce(`|`) %>%
mutate(M, China = .)