data.table: How do I pass a character vector to a

2020-06-22 07:43发布

问题:

Here is a data.table:

library(data.table)
DT <- data.table(airquality)

This example produces the output I want:

DT[, `:=`(New_Ozone= log(Ozone), New_Wind=log(Wind))]

How can I write a function log_those_columns such that the following code snippet outputs the same result?

old_names <- c("Ozone", "Wind")
new_names <- c("New_Ozone", "New_Wind")
log_those_columns(DT, old_names, new_names)

Note that I need old_names and new_names to be flexible enough to contain any number of columns.

(I see from the similar StackOverflow questions on this topic that the answer probably involves some combination of .SD, with=F, parse(), eval(), and/or substitute(), but I can't seem to nail which of those to use and where).

回答1:

Picking up MichaelChirico's comment, the function definition can be written as:

log_those_columns <- function(DT, cols_in, cols_new) {
  DT[, (cols_new) := lapply(.SD, log), .SDcols = cols_in]
}

which returns:

log_those_columns(DT, old_names, new_names)
DT
     Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
  1:    41     190  7.4   67     5   1  3.713572 2.001480
  2:    36     118  8.0   72     5   2  3.583519 2.079442
  3:    12     149 12.6   74     5   3  2.484907 2.533697
  4:    18     313 11.5   62     5   4  2.890372 2.442347
  5:    NA      NA 14.3   56     5   5        NA 2.660260
 ---                                                     
149:    30     193  6.9   70     9  26  3.401197 1.931521
150:    NA     145 13.2   77     9  27        NA 2.580217
151:    14     191 14.3   75     9  28  2.639057 2.660260
152:    18     131  8.0   76     9  29  2.890372 2.079442
153:    20     223 11.5   68     9  30  2.995732 2.442347

as expected.

A more flexible approach

The function used to transform the data can be passed as a parameter as well:

fct_those_columns <- function(DT, cols_in, cols_new, fct) {
  DT[, (cols_new) := lapply(.SD, fct), .SDcols = cols_in]
}

The call:

fct_those_columns(DT, old_names, new_names, log)
head(DT)

works as expected:

   Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1:    41     190  7.4   67     5   1  3.713572 2.001480
2:    36     118  8.0   72     5   2  3.583519 2.079442
3:    12     149 12.6   74     5   3  2.484907 2.533697
4:    18     313 11.5   62     5   4  2.890372 2.442347
5:    NA      NA 14.3   56     5   5        NA 2.660260
6:    28      NA 14.9   66     5   6  3.332205 2.701361

The function name can be passed as character:

fct_those_columns(DT, old_names, new_names, "sqrt")
head(DT)
   Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1:    41     190  7.4   67     5   1  6.403124 2.720294
2:    36     118  8.0   72     5   2  6.000000 2.828427
3:    12     149 12.6   74     5   3  3.464102 3.549648
4:    18     313 11.5   62     5   4  4.242641 3.391165
5:    NA      NA 14.3   56     5   5        NA 3.781534
6:    28      NA 14.9   66     5   6  5.291503 3.860052

or as an anonymous function:

fct_those_columns(DT, old_names, new_names, function(x) x^(1/2))
head(DT)
   Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1:    41     190  7.4   67     5   1  6.403124 2.720294
2:    36     118  8.0   72     5   2  6.000000 2.828427
3:    12     149 12.6   74     5   3  3.464102 3.549648
4:    18     313 11.5   62     5   4  4.242641 3.391165
5:    NA      NA 14.3   56     5   5        NA 3.781534
6:    28      NA 14.9   66     5   6  5.291503 3.860052

An even more flexible approach

The function below derives the names of the new columns by prepending the names of the input columns with the name of the function automatically:

fct_those_columns <- function(DT, cols_in, fct) {
  fct_name <- substitute(fct)
  cols_new <- paste(if (class(fct_name) == "name") fct_name else fct_name[3], cols_in, sep = "_")
  DT[, (cols_new) := lapply(.SD, fct), .SDcols = cols_in]
}

DT <- data.table(airquality)
fct_those_columns(DT, old_names, sqrt)
fct_those_columns(DT, old_names, data.table::as.IDate)
fct_those_columns(DT, old_names, function(x) x^(1/2))
DT
     Ozone Solar.R Wind Temp Month Day sqrt_Ozone sqrt_Wind as.IDate_Ozone as.IDate_Wind x^(1/2)_Ozone x^(1/2)_Wind
  1:    41     190  7.4   67     5   1   6.403124  2.720294     1970-02-11    1970-01-08      6.403124     2.720294
  2:    36     118  8.0   72     5   2   6.000000  2.828427     1970-02-06    1970-01-09      6.000000     2.828427
  3:    12     149 12.6   74     5   3   3.464102  3.549648     1970-01-13    1970-01-13      3.464102     3.549648
  4:    18     313 11.5   62     5   4   4.242641  3.391165     1970-01-19    1970-01-12      4.242641     3.391165
  5:    NA      NA 14.3   56     5   5         NA  3.781534           <NA>    1970-01-15            NA     3.781534
 ---                                                                                                               
149:    30     193  6.9   70     9  26   5.477226  2.626785     1970-01-31    1970-01-07      5.477226     2.626785
150:    NA     145 13.2   77     9  27         NA  3.633180           <NA>    1970-01-14            NA     3.633180
151:    14     191 14.3   75     9  28   3.741657  3.781534     1970-01-15    1970-01-15      3.741657     3.781534
152:    18     131  8.0   76     9  29   4.242641  2.828427     1970-01-19    1970-01-09      4.242641     2.828427
153:    20     223 11.5   68     9  30   4.472136  3.391165     1970-01-21    1970-01-12      4.472136     3.391165

Note that x^(1/2)_Ozone is not a syntactically valid name in R and needs to be put in backquotes:

DT$`x^(1/2)_Ozone`


回答2:

you only need to write a function:

log_those_columns <- function(D,old_names,new_names) 
DT[,(new_names) := lapply(mget(old_names),log)]
log_those_columns(DT,old_names,new_names)
DT
     Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
  1:    41     190  7.4   67     5   1  3.713572 2.001480
  2:    36     118  8.0   72     5   2  3.583519 2.079442
  3:    12     149 12.6   74     5   3  2.484907 2.533697
  4:    18     313 11.5   62     5   4  2.890372 2.442347
  5:    NA      NA 14.3   56     5   5        NA 2.660260
 ---