Case Statement Equivalent in R

2019-01-08 07:05发布

I have a variable in a dataframe where one of the fields typically has 7-8 values. I want to collpase them 3 or 4 new categories within a new variable within the dataframe. What is the best approach?

I would use a CASE statement if I were in a SQL-like tool but not sure how to attack this in R.

Any help you can provide will be much appreciated!

标签: r case
14条回答
姐就是有狂的资本
2楼-- · 2019-01-08 07:09

i dont like any of these, they are not clear to the reader or the potential user. I just use an anonymous function, the syntax is not as slick as a case statement, but the evaluation is similar to a case statement and not that painful. this also assumes your evaluating it within where your variables are defined.

result <- ( function() { if (x==10 | y< 5) return('foo') 
                         if (x==11 & y== 5) return('bar')
                        })()

all of those () are necessary to enclose and evaluate the anonymous function.

查看更多
趁早两清
3楼-- · 2019-01-08 07:11

You can use the base function merge for case-style remapping tasks:

df <- data.frame(name = c('cow','pig','eagle','pigeon','cow','eagle'), 
                 stringsAsFactors = FALSE)

mapping <- data.frame(
  name=c('cow','pig','eagle','pigeon'),
  category=c('animal','animal','bird','bird')
)

merge(df,mapping)
# name category
# 1    cow   animal
# 2    cow   animal
# 3  eagle     bird
# 4  eagle     bird
# 5    pig   animal
# 6 pigeon     bird
查看更多
Animai°情兽
4楼-- · 2019-01-08 07:14

If you got factor then you could change levels by standard method:

df <- data.frame(name = c('cow','pig','eagle','pigeon'), 
             stringsAsFactors = FALSE)
df$type <- factor(df$name) # First step: copy vector and make it factor
# Change levels:
levels(df$type) <- list(
    animal = c("cow", "pig"),
    bird = c("eagle", "pigeon")
)
df
#     name   type
# 1    cow animal
# 2    pig animal
# 3  eagle   bird
# 4 pigeon   bird

You could write simple function as a wrapper:

changelevels <- function(f, ...) {
    f <- as.factor(f)
    levels(f) <- list(...)
    f
}

df <- data.frame(name = c('cow','pig','eagle','pigeon'), 
                 stringsAsFactors = TRUE)

df$type <- changelevels(df$name, animal=c("cow", "pig"), bird=c("eagle", "pigeon"))
查看更多
在下西门庆
5楼-- · 2019-01-08 07:18

Here's a way using the switch statement:

df <- data.frame(name = c('cow','pig','eagle','pigeon'), 
                 stringsAsFactors = FALSE)
df$type <- sapply(df$name, switch, 
                  cow = 'animal', 
                  pig = 'animal', 
                  eagle = 'bird', 
                  pigeon = 'bird')

> df
    name   type
1    cow animal
2    pig animal
3  eagle   bird
4 pigeon   bird

The one downside of this is that you have to keep writing the category name (animal, etc) for each item. It is syntactically more convenient to be able to define our categories as below (see the very similar question How do add a column in a data frame in R )

myMap <- list(animal = c('cow', 'pig'), bird = c('eagle', 'pigeon'))

and we want to somehow "invert" this mapping. I write my own invMap function:

invMap <- function(map) {
  items <- as.character( unlist(map) )
  nams <- unlist(Map(rep, names(map), sapply(map, length)))
  names(nams) <- items
  nams
}

and then invert the above map as follows:

> invMap(myMap)
     cow      pig    eagle   pigeon 
"animal" "animal"   "bird"   "bird" 

And then it's easy to use this to add the type column in the data-frame:

df <- transform(df, type = invMap(myMap)[name])

> df
    name   type
1    cow animal
2    pig animal
3  eagle   bird
4 pigeon   bird
查看更多
趁早两清
6楼-- · 2019-01-08 07:23

If you want to have sql-like syntax you can just make use of sqldf package. Tthe function to be used is also names sqldf and the syntax is as follows

sqldf(<your query in quotation marks>)
查看更多
我命由我不由天
7楼-- · 2019-01-08 07:26

Mixing plyr::mutate and dplyr::case_when works for me and is readable.

iris %>%
plyr::mutate(coolness =
     dplyr::case_when(Species  == "setosa"     ~ "not cool",
                      Species  == "versicolor" ~ "not cool",
                      Species  == "virginica"  ~ "super awesome",
                      TRUE                     ~ "undetermined"
       )) -> testIris
head(testIris)
levels(testIris$coolness)  ## NULL
testIris$coolness <- as.factor(testIris$coolness)
levels(testIris$coolness)  ## ok now
testIris[97:103,4:6]

Bonus points if the column can come out of mutate as a factor instead of char! The last line of the case_when statement, which catches all un-matched rows is very important.

     Petal.Width    Species      coolness
 97         1.3  versicolor      not cool
 98         1.3  versicolor      not cool  
 99         1.1  versicolor      not cool
100         1.3  versicolor      not cool
101         2.5  virginica     super awesome
102         1.9  virginica     super awesome
103         2.1  virginica     super awesome
查看更多
登录 后发表回答