Create new variable in R data frame by conditional

2019-08-09 01:55发布

问题:

I want to create a new variable in an R data frame by using an existing column as a lookup value on another column in the same table. For example, in the following data frame:

df = data.frame(
  pet = c("smalldog", "mediumdog", "largedog",
             "smallcat", "mediumcat", "largecat"),
  numPets = c(1, 2, 3, 4, 5, 6)
  )

> df

        pet numPets
1  smalldog       1
2 mediumdog       2
3  largedog       3
4  smallcat       4
5 mediumcat       5
6  largecat       6

I want to to create a new column called numEnemies which is equal to zero for small animals but equal to the number of animals of the same size but the different species for medium and large animals. I want to end up with this:

        pet numPets numEnemies
1  smalldog       1          0
2 mediumdog       2          5
3  largedog       3          6
4  smallcat       4          0
5 mediumcat       5          2
6  largecat       6          3

The way I was attempting to do this was by using conditional logic to generate a character variable which I could then use to look up the final value I want from the same data frame, which got me to here:

calculateEnemies <- function(df) {
  ifelse(grepl('small', df$pet), 0,
         ifelse(grepl('dog', df$pet), gsub('dog', 'cat', df$pet),
                ifelse(grepl('cat', df$pet),
                       gsub('cat', 'dog', df$pet), NA)))
}

df$numEnemies <- calculateEnemies(df)

> df

        pet numPets numEnemies
1  smalldog       1          0
2 mediumdog       2  mediumcat
3  largedog       3   largecat
4  smallcat       4          0
5 mediumcat       5  mediumdog
6  largecat       6   largedog

I want to modify this function to use the newly generated string to lookup the values from df$numPets based on the corresponding value in df$pet. I'm also open to a better approach that also generalizes.

回答1:

Here's how I would approach this using the data.table packages

library(data.table)
setDT(df)[, numEnemies := rev(numPets), by = sub(".*(large|medium).*", "\\1", pet)]
df[grep("^small", pet), numEnemies := 0L]
#          pet numPets numEnemies
# 1:  smalldog       1          0
# 2: mediumdog       2          5
# 3:  largedog       3          6
# 4:  smallcat       4          0
# 5: mediumcat       5          2
# 6:  largecat       6          3

What I basically did, is to first create groups of medium and large over the whole data set and just reverse the values within each group. Then, I've assigned 0 to all the values in numPets when grep("^small", pet).

This should be both very efficient and robust, as it will work on any number of animals and you don't actually need to know the animals names apriori.