Ignore case in dplyr package

2019-09-20 16:46发布

问题:

I have a variable called "Country" and I would like to create a subset where "Country" equals india, INDIA, UAE and uae. How to do using ignore case in dplyr

I have tried B <-subset(a, country %in% c("india", "INDIA", "uae", "UAE"))

回答1:

To subset in dplyr you would use filter. Here is an example:

library(dplyr)

df <- data_frame(country = c("india", "INDIA", "uae", "UAE", "US", "Germany", "Some other Country"), val = c(1:7))

some.countries <- df %>% filter(grepl("india|uae", country, ignore.case = TRUE))
some.countries
#Source: local data frame [4 x 2]
#
#  country   val
#    (chr) (int)
#1   india     1
#2   INDIA     2
#3     uae     3
#4     UAE     4


回答2:

Converting comment to answer, you could do in base with:

b = a[a$Country=="india" | a$Country=="INDIA" | a$Country=="uae" | a$Country=="UAE",]

As Gopala noted, you can also convert to lower case first and that will simplify the logical argument such as:

a$country <- tolower(a$country)
b = a[a$Country=="india" | a$Country=="uae",]

But note that this will change all the country names to lowercase.



回答3:

As mentioned by @gopala, you can convert the country variable to lower or upper case and then use %in%

a$country <- tolower(a$country)
b  <- a[country %in% c("india", "uae") ,]

If for some reason, such as producing a title in a figure, you would like to retain the case structure of the country variable, you can do the following:

a$country <- 
b  <- a[tolower(a$country) %in% c("india", "uae") ,]


标签: r dplyr