dplyr - mutate: use dynamic variable names

2018-12-31 08:10发布

I want to use dplyr's mutate() to create multiple new columns in a data frame. The column names and their contents should be dynamically generated.

Example data from iris:

require(dplyr)
data(iris)
iris <- tbl_df(iris)

I've created a function to mutate my new columns from the Petal.Width variable:

multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    df <- mutate(df, varname = Petal.Width * n)  ## problem arises here
    df
}

Now I create a loop to build my columns:

for(i in 2:5) {
    iris <- multipetal(df=iris, n=i)
}

However, since mutate thinks varname is a literal variable name, the loop only creates one new variable (called varname) instead of four (called petal.2 - petal.5).

How can I get mutate() to use my dynamic name as variable name?

标签: r dplyr r-faq
7条回答
荒废的爱情
2楼-- · 2018-12-31 08:33

You may enjoy package friendlyeval which presents a simplified tidy eval API and documentation for newer/casual dplyr users.

You are creating strings that you wish mutate to treat as column names. So using friendlyeval you could write:

multipetal <- function(df, n) {
  varname <- paste("petal", n , sep=".")
  df <- mutate(df, !!treat_string_as_col(varname) := Petal.Width * n)
  df
}

for(i in 2:5) {
  iris <- multipetal(df=iris, n=i)
}

Which under the hood calls rlang functions that check varname is legal as column name.

friendlyeval code can be converted to equivalent plain tidy eval code at any time with an RStudio addin.

查看更多
皆成旧梦
3楼-- · 2018-12-31 08:37

In the new release of dplyr (0.6.0 awaiting in April 2017), we can also do an assignment (:=) and pass variables as column names by unquoting (!!) to not evaluate it

 library(dplyr)
 multipetalN <- function(df, n){
      varname <- paste0("petal.", n)
      df %>%
         mutate(!!varname := Petal.Width * n)
 }

 data(iris)
 iris1 <- tbl_df(iris)
 iris2 <- tbl_df(iris)
 for(i in 2:5) {
     iris2 <- multipetalN(df=iris2, n=i)
 }   

Checking the output based on @MrFlick's multipetal applied on 'iris1'

identical(iris1, iris2)
#[1] TRUE
查看更多
初与友歌
4楼-- · 2018-12-31 08:38

Since you are dramatically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names. For example:

multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    df[[varname]] <- with(df, Petal.Width * n)
    df
}

The mutate function makes it very easy to name new columns via named parameters. But that assumes you know the name when you type the command. If you want to dynamically specify the column name, then you need to also build the named argument.

The latest version of dplyr (0.7) does this using by using := to dynamically assign parameter names. You can write your function as:

# --- dplyr version 0.7+---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    mutate(df, !!varname := Petal.Width * n)
}

For more information, see the documentation available form vignette("programming", "dplyr").

Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. See the Non-standard evaluation vignette for more information (vignette("nse")).

So here, the answer is to use mutate_() rather than mutate() and do:

# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    varval <- lazyeval::interp(~Petal.Width * n, n=n)
    mutate_(df, .dots= setNames(list(varval), varname))
}

Older versions of dplyr

Note this is also possible in older versions of dplyr that existed when the question was originally posed. It requires careful use of quote and setName:

# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
    do.call("mutate", pp)
}
查看更多
爱死公子算了
5楼-- · 2018-12-31 08:40

I am also adding an answer that augments this a little bit because I came to this entry when searching for an answer, and this had almost what I needed, but I needed a bit more, which I got via @MrFlik 's answer and the R lazyeval vignettes.

I wanted to make a function that could take a dataframe and a vector of column names (as strings) that I want to be converted from a string to a Date object. I couldn't figure out how to make as.Date() take an argument that is a string and convert it to a column, so I did it as shown below.

Below is how I did this via SE mutate (mutate_()) and the .dots argument. Criticisms that make this better are welcome.

library(dplyr)

dat <- data.frame(a="leave alone",
                  dt="2015-08-03 00:00:00",
                  dt2="2015-01-20 00:00:00")

# This function takes a dataframe and list of column names
# that have strings that need to be
# converted to dates in the data frame
convertSelectDates <- function(df, dtnames=character(0)) {
    for (col in dtnames) {
        varval <- sprintf("as.Date(%s)", col)
        df <- df %>% mutate_(.dots= setNames(list(varval), col))
    }
    return(df)
}

dat <- convertSelectDates(dat, c("dt", "dt2"))
dat %>% str
查看更多
心情的温度
6楼-- · 2018-12-31 08:42

While I enjoy using dplyr for interactive use, I find it extraordinarily tricky to do this using dplyr because you have to go through hoops to use lazyeval::interp(), setNames, etc. workarounds.

Here is a simpler version using base R, in which it seems more intuitive, to me at least, to put the loop inside the function, and which extends @MrFlicks's solution.

multipetal <- function(df, n) {
   for (i in 1:n){
      varname <- paste("petal", i , sep=".")
      df[[varname]] <- with(df, Petal.Width * i)
   }
   df
}
multipetal(iris, 3) 
查看更多
无色无味的生活
7楼-- · 2018-12-31 08:47

Here's another version, and it's arguably a bit simpler.

multipetal <- function(df, n) {
    varname <- paste("petal", n, sep=".")
    df<-mutate_(df, .dots=setNames(paste0("Petal.Width*",n), varname))
    df
}

for(i in 2:5) {
    iris <- multipetal(df=iris, n=i)
}

> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.2 petal.3 petal.4 petal.5
1          5.1         3.5          1.4         0.2  setosa     0.4     0.6     0.8       1
2          4.9         3.0          1.4         0.2  setosa     0.4     0.6     0.8       1
3          4.7         3.2          1.3         0.2  setosa     0.4     0.6     0.8       1
4          4.6         3.1          1.5         0.2  setosa     0.4     0.6     0.8       1
5          5.0         3.6          1.4         0.2  setosa     0.4     0.6     0.8       1
6          5.4         3.9          1.7         0.4  setosa     0.8     1.2     1.6       2
查看更多
登录 后发表回答