可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This question already has an answer here:

Linear Regression and group by in R 10 answers

I am new to R and am trying to run a linear regression on multiple subsets ("Cases") of data in a single file. I have 50 different cases, so I don't want to have to run 50 different regressions...be nice to automate this. I have found and experimented with the ddply method, but this, for some reason, returns the same coefficients to me for each case. Code I'm using is as follows:

ddply(MyData, "Case", function(x) coefficients(lm(Y~X1+X2+X3, MyData)))

Results I get, again, are the same coefficients for each "Case". Any ideas on how I can improve my code so that the regression runs once for each case and gives me unique coefficients for each case?

回答1:

ddply passes data.frames (from splitting the input data.frame) to the function. You probably want this:

ddply(MyData, "Case", function(df) coefficients(lm(Y~X1+X2+X3, data=df)))

(Not tested since you don't provide a reproducible example.)

You passed the whole input data.frame to lm for each group..

回答2:

You can also do it using only base functions:

# load data
data(warpbreaks)

# fit linear model to each subset
fits <- by(warpbreaks, warpbreaks[,"tension"],
           function(x) lm(breaks ~ wool, data = x))

# Combine coefficients from each model
do.call("rbind", lapply(fits, coef))

回答3:

Though you can get pretty far with (d)plyr, I get the most flexibility with a simple for loop (as I’m not too confident with the do() when it comes to wanting more than just the output of coefficients(), for example)

Let’s start by loading the data and loading some packages:

library(dplyr)
library(broom) # get lm output (coefficients) as a dataframe
library(reshape2) # Melt / DCAST for swapping between wide / long format
data(warpbreaks)

Next, create a list of groups we’re going to be iterating over.

GROUPS <- unique(warpbreaks$tension) # Specify groups, one unique model per group.

The code to answer your question, in this case, would be something of the sorts:

for (i in 1:length(GROUPS)){
  CURRENT_GROUP <- GROUPS[i]
  df <- filter(warpbreaks, tension==CURRENT_GROUP) # subset the dataframe
  fit <- lm(breaks ~ wool, data = df) # Build a model
  coeff <- tidy(fit) # Get a pretty data frame of the coefficients & p values
  coeff <- coeff[,c(1,2,5)] # Extract P.Value & Estimate

  # Rename (intercept) to INT for pretty column names
  coeff[coeff$term=="(Intercept)", ]$term <- "INT"

  # Make it into wide format with reshape2 package.
  coeff <- coeff %>% melt(id.vars=c("term"))

  # Defactor the resulting data.frame
  coeff <- mutate(coeff,
                  variable=as.character(variable),
                  term=as.character(term))

  # Rename for prettier column names later
  coeff[coeff$variable=="estimate", ]$variable <- "Beta"
  coeff[coeff$variable=="p.value", ]$variable <- "P"

  coeff <- dcast(coeff, .~term+variable)[,-1]

  rsquared <- summary(fit)$r.squared

  # Create a df out of what we just did.
  row <- cbind(
    data.frame(
      group=CURRENT_GROUP,
      rsquared=rsquared),
    coeff
  )

  # If first iteration, create data.frame -- otherwise: rowbind
  if (i==1){
    RESULT_ROW = row
  }
  else{
    RESULT_ROW = rbind(RESULT_ROW, row)
  } # End if.
} # End for loop

When writing the inner for loop code, just test something out by simulating the loop (first run i <- 1, run the inner loop code step by step, then run i <- 2 etc.). The resulting dataframe:

RESULT_ROW  
##   group   rsquared INT_Beta        INT_P woolB_Beta    woolB_P
## 1     L 0.26107601 44.55556 9.010046e-08 -16.333333 0.03023435
## 2     M 0.07263228 24.00000 5.991177e-07   4.777778 0.27947884
## 3     H 0.12666292 24.55556 9.234773e-08  -5.777778 0.14719571

I’m not saying this is better than the answers described above, I just like the fact that this gives me the flexibility to extract anything from any model (not just lm models that have a pretty coefficients() function).