Formula with dynamic number of variables

Suppose, there is some data.frame foo_data_frame and one wants to find regression of the target column Y by some others columns. For that purpose usualy some formula and model are used. For example:

linear_model <- lm(Y ~ FACTOR_NAME_1 + FACTOR_NAME_2, foo_data_frame)

That does job well if the formula is coded statically. If it is desired to root over several models with the constant number of dependent variables (say, 2) it can be treated like that:

for (i in seq_len(factor_number)) {
  for (j in seq(i + 1, factor_number)) {
    linear_model <- lm(Y ~ F1 + F2, list(Y=foo_data_frame$Y,
                                         F1=foo_data_frame[[i]],
                                         F2=foo_data_frame[[j]]))
    # linear_model further analyzing...
  }
}

My question is how to do the same affect when the number of variables is changing dynamically during program running?

for (number_of_factors in seq_len(5)) {
   # Then root over subsets with #number_of_factors cardinality.
   for (factors_subset in all_subsets_with_fixed_cardinality) {
     # Here I want to fit model with factors from factors_subset.
     linear_model <- lm(Does R provide smth to write here?)
   }
}

标签： r formula

5条回答

太酷不给撩

2楼-- · 2019-01-02 23:17

An oft forgotten function is reformulate. From ?reformulate:

reformulate creates a formula from a character vector.

A simple example:

listoffactors <- c("factor1","factor2")
reformulate(termlabels = listoffactors, response = 'y')

will yield this formula:

y ~ factor1 + factor2

Although not explicitly documented, you can also add interaction terms:

listofintfactors <- c("(factor3","factor4)^2")
reformulate(termlabels = c(listoffactors, listofintfactors), 
    response = 'y')

will yield:

y ~ factor1 + factor2 + (factor3 + factor4)^2

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2019-01-02 23:19

I generally solve this by changing the name of my response column. It is easier to do dynamically, and possibly cleaner.

model_response <- "response_field_name"
setnames(model_data_train, c(model_response), "response") #if using data.table
model_gbm <- gbm(response ~ ., data=model_data_train, ...)

0人赞添加讨论(0) 举报

时光不老，我们不散

4楼-- · 2019-01-02 23:29

You don't actually need a formula. This works:

lm(data_frame[c("Y", "factor1", "factor2")])

as does this:

v <- c("Y", "factor1", "factor2")
do.call("lm", list(bquote(data_frame[.(v)])))

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

5楼-- · 2019-01-02 23:36

See ?as.formula, e.g.:

factors <- c("factor1", "factor2")
as.formula(paste("y~", paste(factors, collapse="+")))
# y ~ factor1 + factor2

where factors is a character vector containing the names of the factors you want to use in the model. This you can paste into an lm model, e.g.:

set.seed(0)
y <- rnorm(100)
factor1 <- rep(1:2, each=50)
factor2 <- rep(3:4, 50)
lm(as.formula(paste("y~", paste(factors, collapse="+"))))

# Call:
# lm(formula = as.formula(paste("y~", paste(factors, collapse = "+"))))

# Coefficients:
# (Intercept)      factor1      factor2  
#    0.542471    -0.002525    -0.147433

0人赞添加讨论(0) 举报

Emotional °昔

6楼-- · 2019-01-02 23:38

Another option could be to use a matrix in the formula:

Y = rnorm(10)
foo = matrix(rnorm(100),10,10)
factors=c(1,5,8)

lm(Y ~ foo[,factors])

0人赞添加讨论(0) 举报

Formula with dynamic number of variables

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间