可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

First off, I am pretty new to this so my method/thinking may be wrong, I have imported a xlsx data set into a data frame using R and R studio. I want to be able to loop through the column names to get all of the variables with exactly "10" in them in order to run a simple linear regression. So here's my code:

indx <- grepl('_10_', colnames(data)) #list returns all of the true values in the data set
col10 <- names(data[indx]) #this gives me the names of the columns I want

Here is the for loop I have which returns an error:

temp <- c()
for(i in 1:length(col10)){
   temp = col10[[i]]
  lm.test <- lm(Total_Transactions ~ temp[[i]], data = data)
  print(temp) #actually prints out the right column names
  i + 1
}

Is it even possible to run a loop to place those variables in the linear regression model? The error I am getting is: "Error in model.frame.default(formula = Total_Transactions ~ temp[[i]], : variable lengths differ (found for 'temp[[i]]')". If anyone could point me in the right direction I would be very grateful. Thanks.

回答1:

Ok, I'll post an answer. I will use the dataset mtcarsas an example. I believe it will work with your dataset.
First, I create a store, lm.test, an object of class list. In your code you are assigning the output of lm(.) every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones.
Then, inside the loop, I use function reformulate to put together the regression formula. There are other ways of doing this but this one is simple.

# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]

lm.test <- vector("list", length(col10))

for(i in seq_along(col10)){
    lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}

lm.test

Now you can use the results list for all sorts of things. I suggest you start using lapply and friends for that.
For instance, to extract the coefficients:

cfs <- lapply(lm.test, coef)

In order to get the summaries:

smry <- lapply(lm.test, summary)

It becomes very simple once you're familiar with *apply functions.

回答2:

You can create a temporary subset in which you select only the columns used in your regression. This way, you won't need to inject the temporary name in the formula.

Sticking up to your code, this should do the trick.

for(i in 1:length(col10)){
 tempSubset <- data[,c("Total_Transactions", col10[i]]
 lm.test <- lm(Total_Transactions ~ ., data = tempSubset)
 i + 1
}

R Loop for Variable Names to run linear regression

问题:

回答1:

回答2:

收藏的人(0)

R Loop for Variable Names to run linear regression

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮