How to add a column of fitted values to a data fra

2019-08-09 05:10发布

Say I have a data frame like this:

X <- data_frame(
  x = rep(seq(from = 1, to = 10, by = 1), 3),
  y = 2*x + rnorm(length(x), sd = 0.5),
  g = rep(LETTERS[1:3], each = length(x)/3))

How can I fit a regression y~x grouped by variable g and add the values from the fitted and resid generic methods to the data frame?

I know I can do:

A <- X[X$g == "A",]
mA <- with(A, lm(y ~ x))
A$fit <- fitted(mA)
A$res <- resid(mA)

B <- X[X$g == "B",]
mB <- with(B, lm(y ~ x))
B$fit <- fitted(mB)
B$res <- resid(mB)

C <- X[X$g == "C",]
mC <- with(B, lm(y ~ x))
C$fit <- fitted(mC)
C$res <- resid(mC)

And then rbind(A, B, C). However, in real life I am not using lm (I'm using rqss in the quantreg package). The method occasionally fails, so I need error handling, where I'd like to place NA all the rows that failed. Also, there are way more than 3 groups, so I don't want to just keep copying and pasting code for each group.

I tried using dplyr with do but didn't make any progress. I was thinking it might be something like:

make_qfits <- function(data) {
  data %>%
    group_by(g) %>%
    do(failwith(NULL, rqss), formula = y ~ qss(x, lambda = 3))
}

Would this be easy to do by that approach? Is there another way in base R?

3条回答
趁早两清
2楼-- · 2019-08-09 05:48

For the lm models you could try

library(nlme)     # lmList to do lm by group
library(ggplot2)  # fortify to get out the fitted/resid data
do.call(rbind, lapply(lmList(y ~ x | g, data=X), fortify))

This gives you the residual and fitted data in ".resid" and ".fitted" columns as well as a bunch of other fit data. By default the rownames will be prefixed with the letters from g.

With the rqss models that might fail

do.call(rbind, lapply(split(X, X$g), function(z) {
    fit <- tryCatch({
        rqss(y ~ x, data=z)
    }, error=function(e) NULL)
    if (is.null(fit)) data.frame(resid=numeric(0), fitted=numeric(0))
    else data.frame(resid=fit$resid, fitted=fitted(fit))
}))
查看更多
Root(大扎)
3楼-- · 2019-08-09 05:51

You can use do on grouped data for this task, fitting the model in each group in do and putting the model residuals and fitted values into a data.frame. To add these to the original data, just include the . that represents the data going into do in the output data.frame.

In your simple case, this would look like this:

X %>%
    group_by(g) %>%
    do({model = rqss(y ~ qss(x, lambda = 3), data = .)
        data.frame(., residuals = resid.rqss(model), fitted = fitted(model))
            })

Source: local data frame [30 x 5]
Groups: g

    x         y g     residuals    fitted
1   1  1.509760 A -1.368963e-08  1.509760
2   2  3.576973 A -8.915993e-02  3.666133
3   3  6.239950 A  4.174453e-01  5.822505
4   4  7.978878 A  4.130033e-09  7.978878
5   5 10.588367 A  4.833475e-01 10.105020
6   6 11.786445 A -3.807876e-01 12.167232
7   7 14.646221 A  4.167763e-01 14.229445
8   8 15.938253 A -3.534045e-01 16.291658
9   9 19.114927 A  7.610560e-01 18.353871
10 10 19.574449 A -8.416343e-01 20.416083
.. ..       ... .           ...       ...

Things will look more complicated if you need to catch errors. Here is what it would look like using try and filling the residuals and fitted columns with NA if fit attempt for the group results in an error.

X[9:30,] %>%
    group_by(g) %>%
    do({catch = try(rqss(y ~ qss(x, lambda = 3), data = .))
    if(class(catch) == "try-error"){
        data.frame(., residuals = NA, fitted = NA)
    }
    else{
        model = rqss(y ~ qss(x, lambda = 3), data = .)
        data.frame(., residuals = resid.rqss(model), fitted = fitted(model))
        }
    })
Source: local data frame [22 x 5]
Groups: g

    x         y g     residuals    fitted
1   9 19.114927 A            NA        NA
2  10 19.574449 A            NA        NA
3   1  2.026199 B -4.618675e-01  2.488066
4   2  4.399768 B  1.520739e-11  4.399768
5   3  6.167690 B -1.437800e-01  6.311470
6   4  8.642481 B  4.193089e-01  8.223172
7   5 10.255790 B  1.209160e-01 10.134874
8   6 12.875674 B  8.290981e-01 12.046576
9   7 13.958278 B -4.803891e-10 13.958278
10  8 15.691032 B -1.789479e-01 15.869980
.. ..       ... .           ...       ...
查看更多
戒情不戒烟
4楼-- · 2019-08-09 06:00

Here's a version that works with base R:

modelit <- function(df) {
    mB <- with(df, lm(y ~ x, na.action = na.exclude))
    df$fit <- fitted(mB)
    df$res <- resid(mB)
    return(df)
}

dfs.with.preds <- lapply(split(X, as.factor(X$g)), modelit)
output <- Reduce(function(x, y) { rbind(x, y) }, dfs.with.preds)
查看更多
登录 后发表回答